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Abstract: 

We present a determination of the parton distributions of the nucleon from a global 
set of hard scattering data using the NNPDF methodology: NNPDF2.0. Experimental 
data include deep-inelastic scattering with the combined HERA-I dataset, fixed target 
Drell-Yan production, collider weak boson production and inclusive jet production. Next- 
to-leading order QCD is used throughout without resorting to X-factors. We present 
and utilize an improved fast algorithm for the solution of evolution equations and the 
computation of general hadronic processes. We introduce improved techniques for the 
training of the neural networks which are used as parton parametrization, and we use 
a novel approach for the proper treatment of normalization uncertainties. We assess 
quantitatively the impact of individual datasets on PDFs. We find very good consistency 
of all datasets with each other and with NLO QCD, with no evidence of tension between 
datasets. Some PDF combinations relevant for LHC observables turn out to be determined 
rather more accurately than in any other parton fit. 
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1 Introduction 



Over the last several years, we have developed a novel approach pQ to the determination 
of parton distribution functions (PDFs), which combines a Monte Carlo representation 
of the probability measure in the space of PDFs with the use of neural networks as a 
set of unbiased basis functions (the NNPDF methodology, henceforth). The method was 
developed, refined, and applied to problems of increasing complexity: the parametrization 
of a single structure function [lj, of several structure functions [2] and the determination 
of the nonsinglet parton distribution [3j . Eventually, in Ref. [I] a first complete set of par- 
ton distributions was constructed, using essentially all the then-available deep-inelastic 
scattering (DIS) data. This parton set, NNPDF1.0, included five independent parton dis- 
tributions (the two lightest flavours and antiflavours and the gluon) . It was then extended 
in Refs. 016] to also include an independent parametrization of the strange and antis- 
trange quarks, with heavier flavours determined dynamically (NNPDF1.2 parton set). All 
NNPDF parton sets are available through the LHAPDF interface [TJIH]- I n these works, as 
well as in studies for the HERA-LHC workshop [9], it was shown that PDFs determined 
using the NNPDF methodology enjoy several desirable features: the Monte Carlo behaves 
in a statistically consistent way (e.g., uncertainties scale as expected with the size of the 
sample) [H|6]; results are demonstrably independent of the parton parametrization [HE]; 
PDFs behave as expected upon the addition of new data (e.g. uncertainties expand when 
data are removed and shrink when they are added unless the new data is incompatible 
with the old) [H[9] and results are even stable upon the addition of new independent PDF 
parametrizations [UH]. 

With PDF uncertainties under control, detailed precision physics studies become pos- 
sible, such as for instance the determination of CKM matrix elements [6]. However, the 
requirements of precision physics are such that it is mandatory to exploit all the available 
information in PDF determination. Specifically, it has been known for a long time (see 
Ref. [1] for references to the earlier literature) that DIS data are insufficient to determine 
accurately many aspects of PDFs, such as the flavour decomposition of the quark and 
antiquark sea or the gluon distribution, especially at large x: indeed, the current state- 
of-the-art PDF determinations, such as CTEQ6.6 [TO] and MSTW2008 p] are based on 
global fits, in which hadronic data are included along with DIS data. 

In this paper we present a PDF determination using NNPDF methodology based on a 
global fit. The data used for fitting include, on top of all the data used in Ref. [6] (DIS data 
and "dimuon" charm neutrino production data) also hadronic data, specifically Drell-Yan 
(DY), W and Z production and Tevatron inclusive jets. We also replace the separate 
ZEUS and HI datasets with the recently published HERA-I combined dataset p2]. The 
dataset used in this parton determination is thus comparable in variety and size (and is 
in fact slightly larger) to that used by the CTEQ [TO] and MSTW groups [TT] . 

The PDF determination presented here is based on a consistent use of NLO QCD. 
This is novel in the context of a global parton determination: indeed, in other parton 
fits such as Refs. [TOJIH] only DIS data are treated using fully NLO QCD, while several 
sets of hadronic data are treated using LO theory improved through iT-factors. The main 
bottleneck in the use of NLO theory for hadronic processes is the speed in the computation 
of hadron-level observables, which requires a convolution of the PDF of both incoming 
hadrons with parton-level cross sections. The use of Mellin-space techniques (as e.g. in 
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Ref. [13] ) solves this problem, but at the cost of limiting the flexibility of the acceptable 
PDF parametrization: specifically, the very flexible neural network method of Refs. [4,6j 
parametrizes PDFs in x space. Efficient fast methods to overcome this hurdle have been 
suggested (see [H] , in [15] ) , based on the idea of precomputing and storing the convolution 
with a set of basis functions over which any PDF can be expanded. These methods have 
been implemented in fast public codes for specific processes, such as FastNLO [16] for jet 
production, and very recently in a general-purpose interface APPLGRID [17 . 

In this paper, we use similar ideas to fully exploit the powerful parton evolution method 
introduced in Ref. [3], based on the convolution of PDFs with a pre-computed kernel, 
determined using Mellin-space techniques. This gives us a new approach, which we call 
the FastKernel method, which we use both for parton evolution, and for the computation 
of DIS and DY physical observables. The FastKernel method leads to a considerable 
increase in speed in comparison to Refs. [HE] for DIS data, and it makes possible for the 
first time to use exact NLO theory for DY in a global parton fit. 

Thanks to the FastKernel method, we are able to produce a first fully NLO global 
parton set using NNPDF methodology: the NNPDF2.0 parton set. This parton deter- 
mination enjoys the same desirable features of the previous NNPDF1.0 and NNPDF1.2 
PDF sets, with which in particular it is fully compatible, though uncertainties are now 
significantly smaller, and in fact sometimes also rather smaller than those of other existing 
global fits. Thanks to the use of a Monte Carlo methodology, it is possible to perform 
a detailed comparison of NNPDF2.0 PDFs with those of previous NNPDF fits, and in 
particular to assess the impact of the various new aspects of this parton determination, 
both due to improved methodology and the use of more precise data and a wider dataset. 
Perhaps the most striking feature of the NNPDF2.0 parton determination is the fact that 
it is free of tension between different datasets and NLO QCD: in fact, whereas the ad- 
dition of new data leads to sizable error reduction, we do not find any evidence of any 
individual dataset being incompatible with the others, nor for the distribution of fit re- 
sults to contradict statistical expectations. Specifically, any combination or subset of the 
data included in the global analysis can be fitted using the same methodology, and results 
obtained fitting to various subsets of data are all compatible with each other. 

Whereas we refer to the previous NNPDF papers [HE] f° r a general introduction to 
the NNPDF methodology, all the new aspects of the NNPDF2.0 parton determination are 
fully documented in this paper. In particular, in Sect. [2] we discuss the features of the 
new data used here, and specifically the kinematics of DY and jet data. In Sect. [3] we 
discuss in detail the FastKernel method, and its application to parton evolution and the 
computation of DIS and DY observables. In Sect. H] we discuss several improvements in 
the techniques that ensure that the quality of the fit to different data is balanced, which 
are made necessary by the greater complexity of the NNPDF2.0 dataset. 

Readers who are not interested in the details of parton determination and the NNPDF 
methodology, and mostly interested in PDF use should skip directly to Sect. El where our 
results are presented. In this section, after comparing the NNPDF2.0 PDF set both with 
previous NNPDF sets and with current MSTW and CTEQ PDFs, we turn to a series 
of studies of its features. Specifically, we study possible non-gaussian behaviour of our 
results by comparing standard deviations with confidence level intervals; we assess one 
by one the impact on the new fit of the aforementioned improved fitting method, of an 
improved treatment of normalization uncertainties discussed elsewhere and used here [18] , 
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of the new combined HERA data, and of the addition of either jet or DY data; we discuss 
the impact of positivity constraints; and we discuss the dependence of our results on the 
value of a s . 

Finally, in Sect. [6] we perform some preliminary studies of the phenomenological impli- 
cations of this PDF determination: after briefly summarizing the quality of the agreement 
between data and theory for the processes used in the fit, we reassess the implication of 
our improved strangeness determination for the so-called NuTeV anomaly |191I20|. and 
we discuss some LHC standard candles. Some statistical tools and a brief summary of 
factorization and kinematics for the DY process are collected in appendices. 
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2 Experimental data 



The NNPDF2.0 parton determination includes both deep-inelastic (DIS) data and hadronic 
Tevatron data for fixed-target Drell-Yan and collider weak vector boson and inclusive jet 
production. The DIS dataset only differs from that used in the previous NNPDF1.2 [6] 
PDF determination in the replacement of separate HI and ZEUS datasets with the com- 
bined HERA-I dataset of Ref. [12]. 

The treatment of experimental data in the present fit follows Ref. [1] , with the exception 
of normalization uncertainties, which are treated using the improved method presented in 
Ref. |18| . the so-called t$ method. All information on correlated systematic errors, when 
available, is included in our fit. 

In this section first we introduce the dataset and the way we construct the experi- 
mental covariance matrix. Then we discuss the details of the new datasets used in the 
NNPDF2.0 analysis as compared to previous work, and finally we show how the Monte 
Carlo generation of replicas of experimental data is used to construct the sampling of the 
available experimental information. 

2.1 Dataset, uncertainties and correlations 

The dataset used for the present fit is summarized in Table [H where experimental data is 
separated into DIS data, fixed target Drell-Yan production, collider weak boson production 
and inclusive jet production. For each dataset we provide the number of points both before 
and after kinematic cuts, and their kinematic ranges. The same kinematical cuts as in [3] 
are applied to DIS data, while no cuts are applied to the hadronic data: we impose 
Q 2 > Ql = m 2 c = 2 GeV 2 and W 2 > 12.5 GeV 2 . 

For hadronic data we use the LO partonic kinematics to estimate the effective range of 
Bjorken-x which eaech dataset span (see Sect. 12.21 below for a definition of the pertinent 
kinematic variables), eeln Fig. [T]we show a scatter plot of the data, which demonstrates 
that the kinematic coverage is now much more extended than in the DIS-only NNPDF1.2 
fit. 

The DIS data of Table U and Fig. ffl differ from the NNPDF1.2 set because of the 
replacement of all ZEUS and HI data from the HERA-I run with the combined set of 
Ref. [12J. The combined HERA-I dataset has a better accuracy than that expected on 
purely statistical grounds from the combination of previous HI and ZEUS data because 
of the reduction of systematic errors from the cross-calibration of the two experiments. 
These data are given with 110 correlated systematic uncertainties and three correlated 
procedural uncertainties, which we fully include in the covariance matrix. The remaining 
DIS data are the same as in Ref. [6], to which we refer for further details. Hadronic data 
are discussed in greater detail in Sect. 12.21 below. 

In Table[2]we show the percentage average experimental uncertainties for each dataset, 
where uncertainties are separated into statistical (which includes uncorrelated systematic), 
correlated systematic and normalization uncertainties. As in the case of Table HJ for the 
DIS datasets we provide the values with and without kinematical cuts, if different. 

The covariance matrix is computed for all the data included in the fit, as discussed 
in Ref. [lj. An important difference in comparison to [3] is the improved treatment of 
normalization uncertainties. Following |18| . the covariance matrix for each experiment is 
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Figure 1: Experimental data which enter the NNPDF2.0 analysis (Table[T|). For hadronic data, 
the values of x\ and xi determined by leading order partonic kinematics (Eqs. (|3]), (HJ and l|12p) 
are plotted (two values per data point). 



computed from the knowledge of statistical, systematic and normalization uncertainties 
as follows: 

/ N c \ / N a N r \ 

(cov to )/j = I ^2 a I,l a J,l + $1^1,8 I FiFj + I ^2 a I,n(TJ,n + ^2 a l^j, n ) Ff'Fy' , (1) 

\l=l J \n=l n=l J 

where / and J run over the experimental points, Fj and Fj are the measured central 
values for the observables / and J, and Fj°\ Fj ^ are the corresponding observables as 
determined from some previous fit. 

The uncertainties, given as relative values, are: an, the N c correlated systematic 
uncertainties; 07 jn , the iV a (N r ) absolute (relative) normalization uncertainties; aj yS the 
statistical uncertainties (which includes uncorrelated systematic uncertainties). The val- 
ues of Fj ^ have been determined iteratively, by repeating the fit and using for Fj ^ at each 
iteration the results of the previous fit. In practice, convergence of the procedure is very 
fast and the final values of F^ used in Eq. (pQ) do not differ significantly from the final 
NNPDF2.0 fit results. Note that thanks to this iterative procedure, normalization uncer- 
tainties can be included in the covariance matrix as all other systematics and therefore 
they do not require the fitting of shift parameters. 
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The use of this treatment of normalization uncertainties is necessary because of the 
presence in the fit of data affected by disparate normalization uncertainties: indeed, the 
simpler method used in Refs. pQ- [5] is only accurate [T8] when all normalization uncer- 
tainties have a similar size. 



2.2 New experimental observables 

The hadronic observables used in the NNPDF2.0 PDF determination correspond to three 
classes of processes: Drell-Yan production in fixed target experiments, collider weak vector 
boson production, and collider inclusive jet production. For each type of process we briefly 
introduce the leading order structure of the observables and kinematics used in Tab. [T] 
and Fig. [H then discuss the features of the data. Full NLO expressions for Drell-Yan 
observables are summarized in Appendix [Bj and their fast implementation is presented in 
detail in Sect. El For jet observables, we interfaced our code with FastNLO [16], by direct 
inclusion of the precomputed tables from this reference, to which we refer for explicit 
expressions for the cross-sections. 



2.2.1 Drell-Yan production on a fixed target 

We consider data for the double-differential distribution in M, the invariant mass of the 
Drell-Yan lepton pair, and either the rapidity of the pair y or Feynman x F , respectively 
defined in terms of the hadronic kinematics as 

1 , qo + q z tq z , , 

y=-ln ; x F = — F , (2) 

2 qo ~ q z V s 

where y/s is the hadron-hadron center-of-mass energy, q is the four- vector of the Drell-Yan 
pair and q z is its projection on the longitudinal axis. 

At leading order, the parton kinematics is entirely fixed in terms of hadronic variables 

by 

x o = ^ e y = ^=e y , x° 2 = V^e-y = ^=e~ y , (3) 



or equivalently 



1 / r~, — -\ n 1 



x y i = ^\xF + ^x' F + Ar\, x° 2 = - I -x f + ^x f + At\ . (4) 

The corresponding inverse relations are 

r = x\xl; M 2 = sx\x° 2 (5) 



and 



1 x° 

y=-ln-i; x F = x\ - x 2 (6) 

2 x% 

At leading order, the y or x F Drell-Yan differential distribution is given by 
dI^ (M2 ' y) = ^^E^^^i'^ft^.^+ftC^l.^i^.M 2 )] ,(7) 

i 
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Table 1: Experimental datasets included in the NNPDF2.0 global analysis. For DIS experiments 
we provide in each case the number of data points and the ranges of the kinematical variables 
before and after (in parenthesis) kinematical cuts. For hadronic data we show the ranges of parton 
x covered for each set (denoted by [x m i n , a; max ]), determined using leading order parton kinematics 
(Eqs. <j3j> , (jlj and (|T2j ) • Note that hadronic data are unaffected by kinematic cuts. The values of 
x m - m and <3^ in for the total dataset hold after imposing kinematic cuts. 
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Table 2: Average statistical, systematic and normalization uncertainties for each of the experi- 
mental datasets included in NNPDF2.0. Uncorrelated systematic uncertainties are considered as 
part of the statistical uncertainty. All uncertainties are given in percentage. Details on the number 
of points and the kinematics of each dataset are provided in Table [T] For DIS experiments average 
uncertainties are given both before and (in parenthesis) after cuts. 
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where a is the fine-structure constant and ej the quark electric charges. 
The fixed-target Drell-Yan data used for our parton determination are: 



• E605 

This experiment provides the absolute cross section for DY production from a proton 
beam on a copper target [31J. The double differential distribution in y and M 2 is 
given. No correlation matrix is provided, and only a total systematic uncertainty 
a sys = 10% is given. Therefore, we will add statistical and total systematic errors in 
quadrature. The only source of correlation between the data points comes from the 
absolute normalization uncertainty of 15%. We do not apply any nuclear corrections, 
which we expect [6] to be small. 

• E866 

This experiment, also known as NuSea, is based on the experimental set-up of the 
previous DY experiments E605 [31J and E772 [40J. The absolute cross section mea- 
surements on a proton target is described in [32 , 33J , while the cross section ratio 
between deuteron and proton targets can be found in [33]. Double differential dis- 
tributions in xp and M are provided. No correlation matrix is provided, and only 
a total systematic uncertainty is given, so we add statistical and total systematic 
errors in quadrature. The only source of correlation comes from the 6.5% absolute 
normalization uncertainty, which cancels in the cross-section ratio |34| . 

Note that we do not include fixed target Drell-Yan data from the E772 experiment |40| 
nor from the deuteron data of E866 [32,33|. These datasets have been shown to have poor 
compatibility with other Drell-Yan measurements |13| and thus do not add additional 
information to the global PDF analysis. As we have shown elsewhere [1]- [6], within 
NNPDF methodology the addition of incompatible data only increases uncertainties, and 
thus these data are not included. The issue of their compatibility with other Drell-Yan 
data will be addressed elsewhere. 



2.2.2 Weak boson production 

We consider the rapidity distributions for W and Z production. At leading order, the 
parton kinematics is as in Eqs. ([2])-([6]), and the differential distribution is given by 

j- = nGF f V J^Cij [qi(x 1 ,M v )qj(x 2 ,M v ) + q i (x 1 ,M v )q 3 (x 2 ,M v )] , (9) 

where My denotes either M\y or Mz\ the electroweak couplings are 

Cij = \ Vij\ for W ± , 

= (vf + af)5ij for Z° unpolarized , (10) 

where |V^| are CKM matrix elements and Vi, en the Z— boson vector and axial couplings. 
The weak boson production data included in our parton determination are: 
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• DO Z rapidity distribution 

This measurement, performed at Tevatron Run II and described in Ref. [36J, gives 
the Z/j* rapidity distribution in the range 71 < M ee < 111 GeV. The contribution 
from the Z°/^y* interference terms is well below the experimental uncertainties and it 
is neglected. No correlation matrix is provided, so we add in quadrature systematic 
and statistical uncertainties. The only correlated systematic error is the absolute 
normalization uncertainty from the Tevatron luminosity, 6.1%. 

• CDF Z rapidity distribution 

This observable is analogous to its DO counterpart, and it is described in Ref. |37j . 
For this experiment, N sys = 11 independent correlated systematic uncertainties are 
provided, which have been used in the construction of the covariance matrix. 

• CDF W boson asymmetry 

This measurement, also performed at Tevatron Run II, is described in Ref. |35j . For 
this dataset, N sys = 7 independent correlated systematic uncertainties are quoted, 
from which the experimental correlation matrix can be constructed. The physical 
observable is the rapidity asymmetry 

_ da w+ /dy w - da w ~ /dy w 
A[VW) ~ d*" + /dy w + d* w -/dy w ' ( j 

Since the A(yw) distribution is symmetric at the Tevatron, the experimental data 
is folded onto positive rapidities to improve statistics. 

Because of the lack of a fast analytic implementation, we do not include lepton-level 
data, such as the Tevatron W asymmetries Refs. |41tl42|. which have been included in 
recent parton fits [11043] using X-factors. The recent development of the APPLGRID |17j 
interface is likely to facilitate the future inclusion of these data in our fits. 

2.2.3 Inclusive jet production 

We include data for the inclusive jet production cross section as a function of the transverse 
momentum px of the jet for fixed rapidity bins Arj. The leading-order parton kinematics 
is fixed by 

x i = i x 2 = ~~ F e 11 '■> (12) 

V s V s 

while a simple leading-order expression for the cross-section is not available because of 
the need to provide a jet algorithm. 
We include the following data: 

• CDF Run II — kx algorithm 

This data is obtained using the kx algorithm with R = 0.7. The dataset and 
the various sources of systematic uncertainties have been described in Ref. [38] , 
We choose to use the kx algorithm measurements rather than the cone algorithm 
measurements [44| . since the latter are not infrared safe. Data at R = 0.7 are 
preferable to available measurements at R = 0.5 or R = 1 since at Tevatron energies 
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R = 0.7 optimizes the interplay between sensitivity to perturbative radiation and 
impact of non-perturbative effects like Underlying Event [4511461 . 

The data is provided in bins of rapidity At? and transverse momentum p?. The 
kinematical coverage can be seen in Table [TJ On top of the absolute normalization 
uncertainty of 5.8%, which is fully correlated among all bins, there are N SJS = 28 
sources of systematic uncertainty, fully correlated among all bins of pr and rj, used 
to construct the covariance matrix. 

• DO Run II — midpoint algorithm 

This dataset is obtained using the MidPoint algorithm with R = 0.7. The dataset 
and the various sources of systematic uncertainties have been described in Ref. |39j . 
While the MidPoint algorithm is IRC unsafe, the effects of such unsafety in inclusive 
distributions are smaller than typical uncertainties [47] and thus it is safe to include 
this dataset into the analysis. 

The data is provided in bins of rapidity A 77 and transverse momentum p?. The 
kinematical coverage can be seen in Table [TJ On top of the absolute normalization 
uncertainty of 6.1%, which is fully correlated among all bins, there are N sys = 23 
sources of systematic uncertainty. 

No inclusive jet measurements from Run I [48ll49| are included. Although their con- 
sistency with Run I data has been debated in the literature [UJEQ], Run II data have 
increased statistics, are obtained with a better understanding of the detector, and are pro- 
vided with the different sources of systematic uncertainties. The issue of the Tevatron jet 
data compatibility will be discussed elsewhere; for the time being, we have checked that 
the NNPDF2.0 fit yields a description of Run I jet data which is reasonably close to that 
of CTEQ6.6 [43], which included such datasets. This suggests that no tension between 
data should arise when these older data are included in the fit. 

2.3 Generation of the pseudo— data sample 

Following Ref. [4], error propagation from experimental data to the fit is handled by 
a Monte Carlo sampling of the probability distribution defined by data. The statisti- 
cal sample is obtained by generating iV rep artificial replicas of data points following a 
multi-gaussian distribution centered on each data point with the variance given by the 
experimental uncertainty as discussed in Sect. 2.4 of Ref. [Tj. 

Appropriate statistical estimators have been devised in Ref. [31 in order to quantify 
the accuracy of the statistical sampling obtained from a given ensemble of replicas (see 
Appendix B of Ref. [3J). Using these estimators, we have verified that a Monte Carlo 
sample of pseudo-data with A" rep = 1000 is sufficient to reproduce the mean values, the 
variances, and the correlations of experimental data with a 1% accuracy for all the exper- 
iments. The statistical estimators for the Monte Carlo generation of artificial replicas of 
the experimental data are shown for each of the datasets included in the fit in Tables [3] 
andU 
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r[F] 


1.00000 


(- (Sen) >dat(%) 

r [a^ en >] 


11.3 
11.4 
0.99996 


<P (CXPj )dat 

<P (SCn) )da t 

r [p^ cn >] 


0.176 
0.179 
0.99676 



Table 3: Table of statistical estimators for the Monte Carlo sample of N Tep = 1000 replicas. All 
estimators are denned in Appendix B of Ref. [3 . Note that uncertainties are given as percentages. 



Experiment 


r[F] 


<- (cxp) >^ (%) 


<- tson) >^ (%) 


r[cr] 






r[p] 


NMC-pd 


1.000 


1.78 


1.72 


0.999 


0.03 


0.03 


0.963 


NMC 


1.000 


4.91 


4.89 


0.998 


0.16 


0.16 


0.987 


SLAC 


1.000 


4.20 


4.16 


0.999 


0.31 


0.29 


0.986 


BCDMS 


1.000 


5.73 


5.70 


0.999 


0.47 


0.46 


0.994 


HERAI-AV 


1.000 


7.52 


7.53 


1.000 


0.07 


0.07 


0.951 


CHORUS 


1.000 


14.83 


14.92 


0.999 


0.09 


0.09 


0.998 


FLH108 


1.000 


71.90 


70.78 


1.000 


0.64 


0.63 


0.997 


NTVDMN 


1.000 


21.22 


21.10 


0.998 


0.03 


0.03 


0.978 


ZEUS-H2 


1.000 


13.79 


13.56 


1.000 


0.28 


0.28 


0.994 


DYE605 


1.000 


22.60 


23.11 


1.000 


0.47 


0.48 


0.983 


DYE866 


1.000 


20.76 


20.73 


1.000 


0.20 


0.19 


0.989 


CDFWASY 


1.000 


5.99 


6.06 


0.999 


0.55 


0.53 


0.995 


CDFZRAP 


1.000 


11.51 


11.52 


1.000 


0.82 


0.82 


0.999 


D0ZRAP 


1.000 


10.23 


10.50 


0.999 


0.53 


0.54 


0.995 


CDFR2KT 


1.000 


22.97 


22.92 


1.000 


0.77 


0.77 


0.998 


D0R2CON 


1.000 


16.82 


17.18 


1.000 


0.78 


0.78 


0.997 



Table 4: Same as Table [3] for individual experiments. Note that uncertainties are given as per- 
centages. 
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3 The FastKernel method 



One of the main upgrades in the NNPDF analysis framework used for this paper has been 
a new fast implementation of the method for the solution of DGLAP evolution equations 
and the computation of factorized observables developed in Refs. [3113], which we call the 
FastKernel method. The method of Refs. [SIS] is based on the idea of pre-computing 
a Green function which takes PDFs from their initial scale to the scale of physical ob- 
servable. The Green function can be determined in N space, thereby requiring a single 
(complex-space) integration for the solution of the evolution equation. Furthermore, the 
Green function can be pre-combined with the hard cross sections (coefficient functions) 
into a suitable kernel, in such a way that the computation of any observable is reduced 
to the determination of the convolution of this kernel with the pertinent parton distribu- 
tions, which are parametrized in x space using neural networks as discussed in Refs. [3JH]. 
For hadronic observables, which depend on two PDFs, a double convolution must be 
performed. 

The main bottleneck of this method is the computation of these convolutions. In 
the FastKernel method, the convolution is sped up by means of the use of interpolating 
polynomials, thereby leading to both fast evolution and fast computation of all observables 
for which the kernels have been determined. This allows us to use in the fit an exact 
computation of the Drell-Yan (DY) process, which in other current global PDF fits |10pilj 
is instead treated using a fC-factor approximation to the NLO (and even NNLO) result, 
due to lack of a fast-enough implementation. 

Several tools for fast evaluation of hadronic observables have been developed recently, 
based on an idea of Ref. |14j . These have been implemented for the case of jet production 
and related observables in the FastNLO framework [16]. More recently, the general- 
purpose interface APPLGRID based on the same idea has been constructed p2]. Also, 
the method has been used in the fast x-space DGLAP evolution code HOPPET [51]. A 
related approach in the case of polarized observables is presented in Ref. [52] . The method 
which is presented in this paper is based on similar ideas, and it allows for the first time 
the fast and accurate computation of fixed target Drell-Yan cross-sections and of collider 
weak boson production. 

In this section we start with a description of the new strategy used to solve the PDF 
evolution equations in the present analysis, as well as the associated technique to compute 
DIS structure functions. Then we turn to discuss how analogous techniques can be used 
for the fast and accurate computation of hadronic observables. Although the method is 
completely general, for simplicity we restrict the discussion to the Drell-Yan process, since 
for inclusive jets FastNLO will be used instead [16] . 

3.1 Fast PDF evolution 

The notation we adopt here is similar to that of Ref. [4J; however here we use the index / 
to denote both the kinematical variables which define an experimental point (x,Q 2 ) and 
the type of observable, while in Ref. [3] I was only labelling observables. 

Before sketching the construction of the observables, we look at PDF evolution. PDFs 
can be written in terms of the basis defined in Ref. [3]: 

fj = {£, g, V, V 3 , V 8 , Vm, V 2A , V 35 ,T 3 ,T 8 ,T 15 ,T 24 , T 35 }. (13) 
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Figure 2: Set of interpolating triangular basis functions. 

As in Ref. [3], we do not consider the possibility of intrinsic heavy flavours, so that only 
seven of these basis functions (the six lightest flavours and the gluon) need to be inde- 
pendently parametrized. If Tj^ is the matrix of DGLAP evolution kernels and (xi,Qf) 
defines the kinematics of a given experimental point, we can write the PDF evolved from 
a fixed initial scale Qq to the scale of the experimental point as 

fi(xi,Q 2 i) =E / -Tjk(-,QlQ 2 i) f k (y,Qt). (14) 
k=1 Jxj y \ y / 

In Ref. the integral in Eq. (|28p was performed numerically by means of a gaussian 
sum on a grid of points distributed between xj and 1, chosen according to the value of x\. 
Here instead we use a single grid in x, independent of the xi value. We label the set of 
points in the grid as x a by a = 1, N x , with 

^min = Xi < X 2 < ... < X Nx -i < X Nx = 1. 

Having chosen a grid of points, we define a set of interpolating functions 1^ such that: 

l {a) (x a ) = 1 
tf a Xxp) = 0,(3^ a 

N x 

Y^X^Hv) = l,Vy. (15) 

a=l 

An illustrative example is given by the basis of functions drawn in Fig. [2j Each function 
has a triangular shape centered in x a and it vanishes outside the interval (x a -i, cc a +i). 
For any y, only two triangular functions are non zero and their sum is always equal to 
one. 

With a general interpolation basis, PDFs at the initial scale can be approximated as 

N x 

fk(y, Ql) = f k {y) = E fkMrt a) (v) + o[{x a+l - x a f] , (16) 

a=l 
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Figure 3: Set of interpolating Hermite cubic functions in the [0,1] interval. 



where p is the lowest order neglected in the interpolation. With the (linear) triangular 
basis Fig. [2] = 2. Dropping for simplicity the dependence on Qq and Q^, Eq. (fH 
becomes 



fj(xi) 



N pdC N. 

EE/. 

k=l a=l 
N pdf N x 

k=l a=l 



f^Vjk (-) l^(y) + 0[(x a+1 

J xi y v. y j 



where 



j xi y \ y 



i {a) (y). 



(17) 



(18) 



In our notation I specifies the data point, a runs over the points in the x-grid and (j, k) 
run over the PDFs which evolve coupled to each other. Having precomputed the o^L 
coefficients for each point /, the evaluation of the PDFs only requires N x evaluations of 
the PDFs at the initial scale, independent of the point at which the evolved PDFs are 
needed, thereby reducing the computational cost of evolution. 

If the interpolation is performed on a more complicated set of functions than the 
triangular basis Fig. [21 better accuracy can be obtained with a smaller number of points 
and thus a reduced computational cost. For PDF evolution we will use the cubic Hermite 
interpolation drawn in Fig. [3j With this choice, for each interval y 6 [x a , x a +i ) the 
function to be approximated can be written as 



fk(y) = h m {t)f k {x a ) + h w {t)h a m a + h m {t)f k {x 



a+l, 



a 17l 



a+l 



+0[(x 



a+l 



Xr, 



where 



K = g(x a+ i) -g(x a ), 



g{y) - g{x c 



(19) 
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and g(y) is a monotonic function in [0,1] which determines the distribution of points in 
the interval (linear, logarithmic, etc.); m a and m a+ i are derivatives of the interpolated 
function at the right and left-hand side of the interval, which can be defined as finite 
differences: 



2h a 



m c 



}° k {x a+1 )-fl(x a ) 



for 2 < a < N x 
for a = 1 
for a = iV r . 



1 



(20) 



Finally the functions h are 3 -order polynomials drawn in Fig. [3] and defined as 



hoo(t) 
h w (t) 
h 01 (t) 
hu(t) 



+3 -3t 2 + 1 = (1 + 2t)(l -i) 2 
/ ' -2t 2 + t = t(t- l) 2 



2r 

a3 



(21) 



2t d + 3r =r(3-2i) 

3 + 2 



t° -t* = t 2 (t 

Collecting all terms, Eq. (fT9j) becomes 



1) 



/ fc °(x Q _!) A^(y) + /°(x Q ) BW(y) + f° k (x a+1 ) C^(y) 
+f k {x a+2 ) D^\y) + 0[(x a+1 - x a ) A ]. 



(22) 



Hence the function, at any given point y is obtained as a linear combination of f° at the 
four nearest points in the grid. The coefficients of such combination are given by: 



A {a \y) 
B {a \y) 

C^(y) 



0, 



for a = 1 
for a/1 

fM^-M*)-^, 



(23) 



/uo(t) 




h a 


2 

hio(t) 


h a +i 
h a 


to 


h a + l 




(1- 


h a 


2 





h 01 (t) + h 11 (t) + ^. 



-h u (t), 

_ fcll(t) 

2 ' 

+ /»io(t), 



+ 



fcio(t) 



for a = 1 

for a = N x - 1 

for a^l,N x -l 

for a = 1 

for a = N x -l 

for a^l.JV^-l 



fee 

2h a+ 



for a = jVj; — 1 
for a^N x -l 



If we substitute Eq. (J23J) into the integral for the evolution of the PDFs, with £ the 
index such that 

2C£ < Xl < 
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we end up with the following expressions for the a coefficients 



+^(iv, - (c + 2)) f£ + + 2 f r, fc (f ) a*+v 



(y). 



for a = £, 
for a = £ + 1, 



+6(n x - (e + 2)) /*«j> f r, fe M ( y ) 

+8(N X - (e + 3)) f (f ) ^+ 2 )(y), for a = £ + 2 

9(N X -(i- 1)) f F jk (f ) ^^(y) 

+«w ? r ^(f) 

+0(iV, - (a + 1)) f£» f T jfc (f^ BW(y) 



(24) 



for £ + 3 < a < N x + 1, 
for a < £. 



Despite the complicated bookkeeping, these expressions can be easily pre-computed and 
input into the fit. 

A final remark: because of the divergent behaviour of the x-space evolution kernel 
at x = 1, the integrals including xi in the integration interval need to be regularized in 
y ~ xj. If we consider for instance the first integral of in Eqn. (|24p . we can perform the 
same subtraction as in Ref. [4] in order to have a consistent definition of all precomputed 
coefficients: 

= i:r d f r ^ if) { Ai0 M - f A{ " ] ^) + Ai ' ] ^ /iz-m dzF ^ z) 

+ A®(x!) [r jk (K)\ N=2 - Io l/X ' +1 dzT jk (z)]. (25) 

As a result all a are regularized; they can be stored once and for all for each experimental 
point, given that they do not depend on the PDF at the initial scale. 

The accuracy of our PDF evolution code, described above, has been cross-checked 
against the Les Houches PDF evolution benchmark tables, originally produced from the 
comparison of the HOPPET [51] and PEGASUS [53] codes. In order to perform a mean- 
ingful comparison, we use the same settings described in detail in Ref. [15]. We show 
in Table 13.11 the relative difference for various combinations of PDFs between our PDF 
evolution and the benchmark tables of Ref. [To] at NLO in the ZM-VFNS, for three dif- 
ferent grids. In each grid, the interval [x m i n , 1] is divided into a log region at small x 
and a linear region medium-high x. As we can see, the choice of a relatively small grid 
of 50 points leads to reproducing the Les Houches tables with an accuracy of 0(1O~ 5 ), 
more than enough for the precision phenomenology we aim to. Note that even though of 
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course each individual replica has more structure than the average PDF, and much more 
structure than the simple Les Houches toy PDFs, they are still quite smooth on the scale 
of this grid, as it can be seen in Fig. HI This ensures that benchmarking with the Les 
Houches table is adequate to guarantee the accuracy of our evolution code. 

3.2 Fast computation of DIS observables 

Using the strategy described in the previous section, we can easily write down the expres- 
sion for the DIS observables included in our fit and show explicitly how their computation 
works on the interpolation basis. The basic idea [Hid] is that, starting with the standard 
factorized expression 

N pdf iVpdf x . . 

<y? IS {xi,Q]) = £Cj*® /*(*/, Q?)= £ / (-.«•($) ) fk(y,Q 2 i)- (26) 

(where / denotes both the observable and the kinematic point), we can absorb the coeffi- 
cient function Cjk into a modified evolution kernel Kjj which can be precomputed before 
starting the fit (see Appendix A of Ref. [lj): 

iV pdf 

a,(Q?),a a (Qg)) = £ C Ik ® T kj ( Xl ,a s (Qj),a s (Q 2 )). (27) 

k=l 

The kernel acts on the j-th PDFs at the initial scale, and it is an observable-dependent 
linear combination of products of coefficient functions and evolution kernels: 



3=1 3=1 3x1 V 



^,a s (Q 2 ),a s (Q 2 ))f°(y,Q 2 ). (28) 
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x (30 pts) 






e r el(^) 


e rcl(sO 


1 • 10~ 7 


2.5 • 10~ 4 


3.5 • 10~ 4 


2.1 • 10~ 4 


2.1 • 10~ 4 


1 • 10~ 6 


1.6 • 10~ 3 


1.5 • 10~ 3 


2.3 • 10~ 4 


2.5 • 10~ 4 


1 • 10~ 5 


1.5 • 10~ 3 


1.4- IO" 3 


2.5 • 10~ 4 


2.8 • 10~ 4 


1 • 10~ 4 


6.5 • 10~ 4 


5.1 • 10~ 4 


3.0 • 10~ 4 


3.4 • 10~ 4 


1 • 10~ 3 


6.5 • 10~ 4 


4.7- 10~ 4 


3.4 • IO" 4 


3.9 • 10~ 4 


1 • 10~ 2 


1.4 • IO -3 


1.9 • 10~ 3 


3.4 • 10~ 4 


5.3 • 10~ 4 


1 • IO" 1 


7.0 • 10~ 4 


1.0 • 10~ 3 


1.1 • 10" 4 


4.1 • 10~ 4 


3- IO" 1 


1.9 • 10~ 5 


8.6 • 10~ 5 


1.3- 10~ 5 


5.8 • 10~ 5 


5- IO" 1 


1.5 • 10~ 4 


1.8 • 10~ 4 


1.0- 10~ 4 


1.1 • 10~ 4 


7- 10- 1 


3.8 • 10- 4 


3.9 • 10~ 5 


3.1 • 10~ 4 


2.8 • 10~ 4 


9- 10- 1 


8.5 • IO" 3 


9.5 • IO" 2 


3.4 • IO" 3 


2.0 • IO" 2 



x (50 pts) 




^-rel(^ti) 


erel(S) 


e ie \(g) 


1 • 10~ 7 


2.1 • 10" 


-4 


2.3 • 10" 


-4 


2.7- 10~ 5 


4.7-10" 


-6 


1 • 10~ 6 


8.9 • 10" 


-5 


8.4-10" 


-5 


3.0- IO" 5 


2.1 • 10" 


-5 


1 • 10~ 5 


9.3 • 10" 


-5 


6.0 • 10" 


-5 


2.3- 10" 5 


2.0 • 10" 


-5 


1 • io- 4 


4.5 • 10" 


-5 


2.8 • 10- 


-5 


4.4 • IO" 5 


4.2 • 10" 


-5 


1 • io- 3 


3.0 • 10" 


-5 


1.7-10" 


-5 


4.0- 10~ 5 


3.5 • 10" 


-5 


1 • 10~ 2 


7.9 • 10" 


-5 


6.8 • 10- 


-5 


4.5 • 10~ 5 


5.8 • 10" 


-5 


1 • IO" 1 


1.7-10" 


-4 


2.1 • 10- 


-4 


1.6- 10" 5 


3.9 • 10" 


-5 


3- IO" 1 


9.1 • 10- 


-6 


3.9 • 10- 


-5 


1.1 • 10~ 5 


1.9 • 10" 


-7 


5- IO" 1 


2.4 • 10- 


-5 


2.2 • 10- 


-5 


2.2 • 10~ 5 


2.2 • 10" 


-5 


7- IO" 1 


9.1 • 10" 


-5 


1.5 • 10- 


-5 


7.8 • IO" 5 


1.2 • 10- 


-4 


9- IO" 1 


1.0 • io- 


-3 


3.3 • 10" 


-3 


8.0 • IO" 4 


2.8 • 10- 


-3 



x (100 pts) 


Crel(^"i)) 


6rcl(^u) 


e rc i(S) 


erel(0) 


1 • 10~ v 


3.2 • 10~ 5 


5.0- 10~ 5 


5.4 • 10-* 


2.0 • IO" 5 


1 • 10~ 6 


2.6 • 10~ 6 


1.3- 10~ 6 


5.7- 10~ 6 


5.9 • IO" 6 


1 • 10~ 5 


1.1 • 10" 5 


2.2 • 10~ 5 


3.7- IO" 6 


1.0 • 10~ 5 


1 • io- 4 


1.8 • 10" 5 


3.3- IO" 6 


1.3 • 10~ 5 


6.9 • 10~ 6 


1 • 10~ 3 


1.3 • IO" 6 


4.9- IO" 6 


4.7- IO" 6 


7.7 • 10~ 6 


1 • 10~ 2 


1.6 • 10~ 5 


1.7- IO" 5 


4.8 • 10~ 6 


1.1 • 10~ 6 


1 • IO" 1 


3.4- 10~ 5 


2.9- 10~ 5 


8.7- 10~ 6 


2.1 • 10" 6 


3- IO" 1 


2.0 • 10~ 6 


2.5 • IO" 5 


7.9 • 10~ 6 


3.9 • 10~ 6 


5- 10" 1 


1.7- IO" 5 


1.3- 10~ 5 


1.7- 10~ 5 


3.1 • 10~ 5 


7- 10" 1 


7.1 • 10~ 5 


8.3- IO" 6 


6.3 • IO" 5 


1.3 • IO" 4 


9- IO" 1 


3.9 • IO" 5 


3.8 • IO" 4 


2.5 • IO" 5 


1.7- IO" 3 



Table 5: Relative accuracy of FastKernel evolution compared to the Les Houches bench- 
mark tables for PDFs evolved to the scale Q 2 = 10 4 GeV 2 . The interpolation is performed 
on cubic Hermite polynomials and the grid is composed of 30 points (top), 50 points (mid- 
dle), or 100 points (bottom), distributed logarithmically in the small-x region and linearly 
in the medium- and large-x region. 
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If we substitute Eq. (|17j) into the expression for the observable, we can write it as: 

o? W (*/,Q?) = E E / p W«)(y) (29) 

= EE/i( x «)^ + °[(^ + i-x Q ) p ], 

i=l a=l 

where 

t I aj (xi,Q 2 ,Q'i) = tij = f-Krj (—,a s {Q 2 I ),a s {Ql) \ l^(y). (30) 

Jxj y \y j 

Now the only index running over the PDF basis is j because the other index k is contracted 
in the definition of K. 

Consider for example the expression for the deuteron structure function. We can write 
down explicitly the terms of Eq. (130p as: 

N x 

F£( XI , Qj) = vLo fwM + °ix fiM + ^2 /a(*a) + 0[(x a+1 - x a ) p ], (31) 

a=l 

with 

= /i*A(c^®r-)(a)K«)(y) 
= /if [-A(c?2, ff ®r".«)(^)+i(c 2l4 ®^)(^) 
-w ® r 86 *) (f ) + 4 (cfe* ® r* M ) (f ) 

-<*(»/) (C7 2)g ®r s ^)(f)]X(-)( y ) 

= £?[-i(^®r 15 ' ? )(f) + i(^®^)(|) 
-A (c a , 9 ^ r 35 .*) (f ) + JL (c a> , r*») (f ) 

-c^^^^r^jf^)]^^ (32) 

where all kernels and coefficient functions are defined in Ref. [3] and 

/lO = ^8,0 /l = S /° = 50 
in the evolution basis of Eq. (fT3j) , 

3.3 Fast computation of hadronic observables 

The FastKernel implementation of hadronic observables requires a double convolution of 
the coefficient function with two parton distributions. We could follow the same strategy 
used for DIS: construct a kernel for each observable and each pair of initial PDFs, and then 
compute the double convolution with a suitable generalization of the method introduced 
in Sect. 13.21 However, for hadronic observables, we adopt a somewhat different strategy, 
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which allows us to treat in a more symmetric way processes for which a fast interface 
already exists (such as jets) and those (such as DY) for which we have to develop our own 
interface. Namely, instead of including the coefficient function into the kernel according 
to Eq. ()27p . we compute the convolution Eq. (|26|) using the fast interpolation method. 

To see how this works, consider first the case of a process with only one parton in 
the initial state. Starting from Eq. (|26p . we can project the evolved PDF ff. onto an 
interpolation basis as follows: 

Npdf N y f 1 d f \ 

<r? /S (*/,Q/) = ££/fc(y Q ,Qj) / —Cik[-,a.(Ql) )l a (y) + 0[(y a+1 -y a n 

fc=l a=l Jx l 11 V V ' 

(33) 

where q indicates the first order neglected in the interpolation of the evolved PDFs. This 
defines another grid of points, {y a }, upon which the coefficients can be pre-computed 
before starting the fit: 



f ^C Ik (^,a s (Q}))z<*(y) = C? k . 

Jxj y \y J 



(34) 



If, on top of this interpolation, we interpolate the parton distributions at the initial scale 
on the {x a } grid as we did in the previous subsection, we get 

N pdl Ny 

<J? IS {xi,Q]) = Y^Y.h{y a ,Q^)Cf k + 0[{y a ^-y a Y] (35) 

k=l a=l 
N pdi N y Nx 

= EEE c ?* d in fn(^) + o[(y a+ i - y a ) q { Xp+1 - xp n 

k,n=l a=l 13=1 

Notice that the two interpolations are independent of each other. The number of points 
N x and N y in each grid, the interpolating functions, and the interpolation orders p and q 
are not necessarily the same. 

We now apply this to the rapidity-differential Drell-Yan cross section, introduced in 
Sect. 12.21 to exemplify the procedure. The NLO cross section is given by 

j DY N i rl pi 

■ n (Q 2 i)Y. e2 j / dxi / dx * ( 36 ) 



dQfdY! ; 



[Qj (xi , Qj)qj {x 2 ,Qj) + Qj (x 2 , Qj)qj {x x , Qj)] (D«(xi , x 2 , Yf)) 
+9(xi,Qj) [qj^QD + qj^Ql)] (D^(x 1 ,x 2 ,Y I )) 
+g(x 2 ,Qj)[q j (x 1 ,Qj) + q j (x 1 ,Qj)] (D^(x 1 ,x 2 ,Y I )) \, 



where the normalization factor is explicitly written in Sect. 12 .21 and the coefficient functions 
can be found in Refs. [54, 55J (see also Appendix [B]). 

For each point of the interpolation grid, we define a set of two-dimensional interpolat- 
ing functions as the product of one-dimensional functions defined in Eq. ([15D : 

l^\ Xl ,x 2 ) =l^( Xl )l^(x 2 ). (37) 
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The product of two functions can be approximated by means of these interpolating func- 
tions as 

Ny 

f(yi)Hy 2 )= Yl f(yi,My2,p)i {a ^Hyi,y2) + o[(y 1 , a+1 -y 1 , Q y(y 2 ,f }+1 -y 2 ^n (38) 

a,P=l 

Applying Eq. <$g§ to the PDFs in Eq. ([37]) . we get 

da DY Nq Nx 

d Q2 dY = n ( ( 3?)5Z e i [%(Vh<*)%(y2,p) + Qi(Vi,<*)Qifa,p)] ( 39 ) 

i=l Q,/3=l 

f dxt [ dx 2 l^\x 1 ,x 2 )D^(x 1 ,X2,Y I ) 

Jx° Jx° 

+ [g(yi, a )(q J (y 2 ,p) + q j (y 2 M f dxi f dx 2 l^\x u x 2 )D^{x 2 ,x u Y I ) 



/„0 

1 x 2 



+ [g(yi,a)(qj(y 2 ,p) + qjiyw))] / d Xl / dx 2 X^)(x 1 ,x 2 )^(x 1 ,x 2 ,y / ) 

- yi, Q ) 9 (2/2,/3+i - y2,p) q ], 

where at next-to-leading order L> qg (xi, x 2 , Yj) = D gq (x 2 ,xi,Yj). Therefore, we can define 
C Sf = / daTl f 1 dx^^ixux^D^xux^Yj), (40) 

where £,j run over the non-zero combinations of g, q and 5. By substituting them into 
Eq. (|4"0"1). we end up with the expression 



dc 

d0 2 dY = n (Q 2 i)^2 e2 j Yl c tm + ??(yi 1 °)Q'i(y2^)] 

+<f b(»i,«)fe(ite,/9) + ^(ifa^))] 

+cftf [fe(yi,a) + ^(2/M))5(2/2^)] 

+0[(yi )a+1 - yi, Q ) 9 (y 2 ,/3+i - y 2 ,/3) 9 ] , (41) 

which is the analogue of Eq. (|33p for a hadronic observable. The physical basis {q}j and 
the evolution basis {f}j are related by a matrix A: 

(7j — Aj r f r qj — Aj s f s . 

Each PDF / is evolved at the physical scale of the process, and the evolution matrix T 
which relates the initial scale PDFs to the evolved ones is 

fr(x, Q 2 ) = T rn ( X , Qq, Q 2 ) CS> fn{x, Qq)- 
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Therefore Eq. (|4ip becomes 

N q N y ATpdf 



da 



DY 



dQjdY! 



Defining 



(Q?)E e ' E E c Xf ) (M.+Vi.)/rW/.M ( 42 ) 

J = l a,/J=l r,s=l 

+ [C*£f M^s + A,.) + C^\A jr + A^)^] fr(yi, a )fs(V2,P)) 

+C[(yi,a+i - yi,a) 9 (y2,/3+i - s^) 9 ]- 



d rs = E e i [^(^js + Aj- S ) + (Aj> + A,>)£ a2 ] 



(43) 



and applying Eq. (|17p to the evolved PDFs, we end up with a result which is similar to 
Eq. m: 



da DY 
dQjdY! 



n 



N x N P dC r N y iVp df 

«?) E E E E 

7,5=1 i,m=l a,/3=l r,s=l 



Ssm 



(44) 



+0[(yi t0l+1 - yi, a ) g (y2,i3+i - y 2 , / 3) g (x 1 ^ +1 - x lt -y) p (x 2 ,5+i - x 2 ,s) p ]- 



In order to define the coefficients in Eq. (|40p . we have to make an explicit choice of 
an interpolating basis. For the interpolation of the evolved PDFs we use the triangular 
interpolating basis drawn in Fig. [21 defined as 



E^(y) 



y - y a -i 



6[{ya - y){y - Va-i)\ + 



y a +i - y 



y a - y a -i y a +i - y a 

With this definition, we can project the PDFs on the triangular basis 

N x 



i[(va-v)(v-v* + i)]- (45) 



q(y) = E E(a) (y) + °l(y«+i - vaf 



a=l 



and define 



C K$=I dx i I dx 2 E^\x 1 )E^(x 2 )D ( i f\x 1 ,x 2 ), 



(46) 



where K indicates the perturbative order and i,j run over the non-zero combinations of 
q, q and g. To be more explicit, defining the index £ and Q in such a way that 



X£ < Xi < X£ < x\ < X£ + l, 



(47) 
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we can give the precise definition of the NLO coefficients: 



C 



K,ij 



dxt Q +1 dx 2 EM( X1 ) eW(x 2 )dIP( Xi ,x 2 ), « = e, ? + 1, /3 = C, C + 1 

lf{ +1 tel /J 5 -! 1 ^2 E^{ Xl ) E^{x 2 )D\f\x 1: x 2 ), a < ? + 1, /3 > C + 2, 

/"h 1 dx i dx * E{a) ^) E^{x 2 )D^\x 1: x 2 ), a > e + 2, /3 < C + 1, (48) 

dxi dX2 E{a){xi) EW {^)D\f\ Xl ,x 2 ), a > £ + 2, P > ( + 2, 

a < f - 1, /?<C-1, 

while the expression for the LO is trivial, given that D^g '(x\,x 2 ) = 8{x\ — x\)5{x 2 — x 2 ). 

The FastKernel method for hadronic observables is easily interfaced to other existing 
fast codes, such as FastNLO for inclusive jets |16j . by simply using FastKernel for the 
interpolation at the initial scale and parton evolution, and exploiting the existing interface 
for the convolution of the evolved PDF with the appropriate coefficient functions. In the 
particular case of the inclusive jet measurements used in the present analysis, the analogs of 
the coefficients dftp in Eq. gU can be directly extracted from the FastNLO precomputed 
tables through its interface, although in such case the relevant PDF combinations are 
different than those of the DY process Eq. (j4ip . 



3.4 FastKernel benchmarking 

It is straightforward to extend the FastKernel method described in the previous section to 
all fixed-target DY and collider vector boson production datasets described in Sect. 12.21 
using the appropriate couplings and PDF combinations. More details on the computation 
of these observables can be found in Appendix [Bj 

In order to assess the accuracy of the method, we have benchmarked the results ob- 
tained with our code to those produced by an independent code [56] which computes the 
exact NLO cross sections for all relevant Drell-Yan distributions. The comparison is per- 
formed by using a given set of input PDFs and evaluating the various cross-sections for all 
observables included in the fit in the kinematical points which correspond to the included 
data. 

The benchmarking of the FastKernel code for the Drell-Yan process has been per- 
formed for the following observables, introduced in Sect. 12.21 

• Rapidity and xp distributions and asymmetries for fixed target Drell-Yan in pp and 
pCu collisions (E605 and E866 kinematics) 

• The W rapidity distribution and asymmetries at hadron colliders (Tevatron kine- 
matics) 

• The Z rapidity distribution at hadron colliders (Tevatron kinematics) 

The results of this benchmark comparison are displayed in Fig. [5l where the relative 
accuracy between the FastKernel implementation and the exact code is shown for all data 
points included in the NNPDF2.0. This accuracy has been obtained with a grid of 100 
points distributed as the root square of the log from a? m i n to 1. 

It is clear from Fig. [5] that with a linear interpolation performed on a 100-points grid, 
we get a reasonable accuracy for all points, 1% in the worse case, which is suitable because 
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the experimental uncertainties of the available datasets are rather larger (see Table [2]). 
This accuracy can be improved arbitrarily by increasing the number of data points in the 
grid, with a very small cost in terms of speed: this is demonstrated in Fig. [BJ where we 
show the improvement in accuracy obtained by using a grid of 500 points. 
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Figure 5: Relative accuracy for NLO Drell-Yan rapidity distributions using the FastKernel 
method, compared to the code of [56 , as a function of rapidity y. Each point corresponds to 
the kinematics of a data point included in the NNPDF2.0 fit. The accuracy refers to a grid of f 00 
points distributed as the root square of the log from x m - m to 1. 
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Figure 6: Same as Fig. [5l for 40 points in the kinematical range covered by the data points 
included in the NNPDF2.0 fit, using a grid of 500 points distributed as the root square of the log 
from x m i n to 1. 



28 



4 Minimization and stopping 



As discussed at length in Ref. [4j, our parametrization of PDFs differs from other ap- 
proaches in that we use an unbiased basis of functions characterized by a very large and 
redundant set of parameters: neural networks. This requires a detailed analysis of the 
fitting strategy. There are two difficulties that have to be overcome. First, it is neces- 
sary to devise an algorithm to fit neural network PDFs: observables depend nonlocally 
and sometimes nonlinearly on PDFs through convolutions, and the fitting strategy must 
deal with this dependence. We have solved this difficulty by means of genetic algorithms. 
Second, any redundant parameterization may accommodate not only the smooth shape 
of the "true" underlying PDFs, but also fluctuations of the experimental data. The best 
fit form of the set of PDFs is not just given by the absolute minimum of some figure of 
merit: it is the possibility of further decreasing the figure of merit which guarantees that 
the best fit is not driven by the functional form of the parameterization. The best fit is 
instead given by a suitable optimal training, beyond which the figure of merit improves 
only because one is fitting the statistical noise in the data, which raises the question of 
how this optimal fit is determined. We solve this through the so-called cross-validation 
method |57| . based on the random separation of data into training and validation sets. 
Namely, PDFs are trained on a fraction of the data and validated on the rest of the data. 
Training is stopped when the quality of the fit to validation data deteriorates while the 
quality of the fit to training data keeps improving. This corresponds to the onset of a 
regime where neural networks start to fit random fluctuations rather than the underlying 
physics (over learning) . 

4.1 Genetic algorithm strategy 

The fitting of a set of neural networks (which parameterize the PDFs) to the data is 
performed by minimization of a suitably defined figure of merit [4j. This is a complex task 
for two reasons: we need to find a reasonable minimum in a very large parameter space, 
and the figure of merit to be minimized is a nonlocal functional of the set of functions 
which are being determined in the minimization. Genetic algorithms turn out to provide 
an efficient solution to this minimization problem. 

The basic idea underlying genetic algorithm minimization is to create a pool of possible 
solutions to minimize the figure of merit, each one characterized by a set of parameters. 
Genetic algorithms work on the parameter space, creating new possible solutions and 
discarding those which are far from the minimum. As a consequence, the genetic algorithm 
cycle corresponds to successive generations where: i) we create new possible solutions by 
mutation and crossing; ii) we naturally select the best candidates and eliminate the rest. 
This strategy has proven to be generally very useful to deal with minimization of functional 
forms which are further convoluted to deliver observables (see Ref. [58,59 for applications 
unrelated to PDF fitting). 

The fitting of the neural networks on the individual replicas is performed by minimizing 
the error function [3] 

1 N dat 

E (k) = J_ £ ( Fj (axt)(fc) _ F (net)(k)^ ((cov*,)" 1 ) ^ - F^^) , (49) 
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where Fj is the value of the observable Fj at the kinematical point / corresponding to 
the Monte Carlo replica k, and _p^ net ^ k ^ is the same observable computed from the neural 
network PDFs, and where the to covariance matrix cov to has been defined in Eq. ([T]). The 
details of how genetic algorithm minimization is applied to the problem of PDFs fitting 
was presented in Ref. [4J. This strategy has been now improved in order to deal with the 
addition of multiple new experimental datasets, as we shall now discuss. 



4.2 Targeted weighted training 

In order to deal more efficiently with the need of fitting data from a wide variety of different 
experiments and different datasets within an experiment we adopt a dynamical weighted 
fitting technique. The basic idea is to construct a minimization procedure that rapidly 
converges towards a configuration for which the final figure of merit E^ is as even as pos- 
sible among all the experimental sets. Weighted fitting consists of adjusting the weights of 
the datasets in the determination of the error function during the minimization procedure 
according to their individual figure of merit: datasets that yield a large contribution to 
the error function get a larger weight in the total figure of merit. 

In a first epoch of the genetic algorithms minimization, weighted training is activated. 
This means than rather than Eq. (149 j) . the actual function which is minimized is 

-. -/Vscts 

E ^ = aF-E P^iWf > » (50) 

where E- is the error function in Eq. (|49p restricted to the dataset j, iVdat,j is the number 

(k) 

of points of this dataset and p- are weights associated to this dataset which are adjusted 
dynamically as described below. 

(k) 

In the present analysis, a different, more refined way of determining the weights pj 
has been adopted as compared to Refs. [3,4j. The idea is the following: in the beginning 
of the fit, target values .E* arg for the figure of merit of each experiment are chosen. Then, 
at each generation of the minimization, the weights of individual sets are updated using 
the conditions 

1. If > £f rg , then pf ] = (eW/E?**) 2 , 

2. If E\ k) < Ef Tg , then pf ) = . 

Hence, sets which are far above their target value will get a larger weight in the figure 
of merit. On the other hand, sets which are below their target are likely to be already 
learnt properly and thus are removed from the figure of merit which is being minimized. 
The determination of the target values £'* arg for all the sets which enter into the fit is an 
iterative procedure that works as follows. We start with all E^ rg = 1 and proceed to a 
first very long fit. Then, we use the outcome of the fit to produce a first nontrivial set of 
^ arg values- This procedure is iterated until convergence. In practice, convergence is very 
fast: we have used the values of (Ei) from a first batch of 100 replicas, in turn produced 
using as target values those of a previous very long fit; these values differ generally by 
2 — 4% (at most 10% in a couple cases) from the values of (Ei) for the reference fit shown 
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in Table [TUJ This implementation of targeted weighted training is such that the error 

(k) 

function of each dataset tends smoothly to its "natural" value, that is, p\ — > 1 as the 
minimization progresses. Those sets which are harder to fit are given more weight than 
the experiments that are learnt faster. 

An important feature of weighted training is that weights are given to individual 
datasets (as identified in Table [T]) and not just to experiments. This is motivated by 
the fact that typically each dataset covers a distinct, restricted kinematic region. Hence, 
the weighting takes care of the fact that the data in different kinematic regions carry 
different amounts of information and thus require unequal amounts of training. 

As an illustration of our procedure, we show in Fig. ul the p\ weight profiles as a 
function of the number of genetic algorithm generations for some sets of a given typical 
replica. Note how, at the early stages of the minimization, sets which are harder to learn, 
such as BCDMSp or NMC-pd are given more weight than the rest, while at the end of the 

(k) 

weighted training epoch all weights are either ~ 1 or oscillate between and 1, a sign 
that these sets have been properly learnt. 

The targeted weighted training epoch lasts for N^ n generations, unless the total error 
function Eq. (|49p is above some threshold E^ > E sw . If it is, weighted training continues 
until E^ k ' falls below the threshold value. Afterwards, the error function is just the 
unweighted error function Eq. (|49|) computed on experiments. In this final training epoch, 
a dynamical stopping of the minimization is activated, as we shall discuss in the next 
section. Going through a final training epoch with the unweighted error function is in 
principle important in order to eliminate any possible residual bias from the choice of 
I?* arg values in the previous epoch. However, in practice this safeguard has little effect, 
as it turns out that all weights tend to unity at the end of the targeted weighted training 
epoch as they ought to. The whole procedure ensures that a uniform quality of the fit for 
all datasets is achieved, and that the fit is refined using the correct figure of merit which 
includes all the information on correlated systematics. 

4.3 Genetic algorithm parameters 

Genetic algorithms are controlled by some parameters that can be tuned in order to 
optimize the efficiency of the whole minimization procedure. The creation of new candidate 
PDFs that can lower the figure of merit used in the minimization is implemented using 
mutations. That is, each PDF is modified by changing some of the parameters that define 
the neural network. In this work, the initial mutation rates rffj , where i labels the PDF 
and j the specific mutation within this PDF, for the individual PDFs are kept the same as 
in [HE]. As training proceeds, all mutation rates are adjusted dynamically as a function 
of the number of iterations Ai te 

m,j = r,f]/N[ t l . (51) 

In order to optimally span the range of all possible beneficial mutations, we introduce 
an exponent r v which is randomized between and 1 at each interation of the genetic 
algorithm. An analysis of the values of r v for which mutations are accepted in each 
generation reveals a flat profile: both large and small mutations are beneficial at all stages 
of the minimization. 

The number of mutants (new candidate solutions) in each genetic algorithm generation 
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Figure 7: Illustration of the weighted training in one particular replica. Individual weights for 
each dataset converge to a value of pi which is close to 1 as the training progresses. Only the 
behaviour of representative datasets is shown. 
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gen 
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mut 
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10000 


2500 


30000 


2.6 


80 


10 



Table 6: Parameter values for the genetic algorithm. 



depends on the stage of the training. When the number of generations is smaller than 
iV™jf, we use a large population of mutants N^ ut 3> 1, while afterwards we use a much 
reduced population iV mut <C N® nut . The rationale for this procedure is that at early stages 
of the minimization it is beneficial to explore as large a parameter space as possible, thus 
we need a large population. Once we are closer to a minimum, a reduced population 
helps in propagating the beneficial mutations to further improve the fitness of the best 
candidates. The final choices of parameters of the genetic algorithm which have been 
adopted in the NNPDF2.0 parton determination are summarized in Table El 

4.4 Preprocessing 

Neural networks can accommodate any functional form, provided they are made of a large 
number of layers and sufficient time is used to train them. Nevertheless, it is customary 
to use preprocessing of data to subtract some dominant functional dependence. Then, 
smaller neural networks can be trained in a short time to deal with the deviations with 
respect to the dominant function. In our case, we use preprocessing to divide out some of 
the asymptotic small and large x behaviour of PDFs. We avoid possible bias related to 
this by exploring a large space of preprocessing functions. 

In this work, preprocessing is implemented in the way described in Sect. 3.1 of Ref. [6], 
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rub 


[?7l m i n , W2 m ax] 


[fl m j n , ?1 m ax] 


r x 2 , m 


r x' 2 ,n 


E(x,Qg) 


[2.55,3.45] 


[1.05,1.35] 


-0.018 


0.131 


g(x,Q'o) 


[1.05,1.35] 


[1.05,1.35] 


-0.002 


0.050 




[2.55,3.45] 


[0,0.5] 


-0.023 


-0.130 


Vt(x,Q 2 ) 


[2.55,3.45] 


[0,0.5] 


0.003 


-0.068 


A s (x,Q® 


[12, 14] 


[-0.95,-0.65] 


0.000 


-0.069 


s + (x,Q 2 ) 


[2.55,3.45] 


[1.05,1.35] 


0.021 


-0.055 


s~(x,Ql) 


[2.55,3.45] 


[0,0.5] 


-0.027 


-0.015 



Table 7: The range of random variation of the large- a; and small- a; preprocessing exponents m and 
n used in the present analysis (the precise form of these exponents is given in Sect. 3.1 of Ref. [B]). 
The last two columns give the correlation coefficient Eq. (f53|) between the \ 2 an d respectively the 
large and small-ai preprocessing exponents. 



to which we refer for a more detailed discussion. However, we now adopt in the fit a wider 
randomized range of variation of preprocessing exponents, thus ensuring greater stability 
and lack of bias. The range of preprocessing exponents used here is shown in Table [71 

The explicit independence of results on preprocessing exponents within the ranges 
denned in Table [7J can be verified by computing the correlation between the value of a 
given preprocessing exponent and the associated value of the x 2 computed between the 
fc-th net and experimental data, defined by 

x m = -L £ " F^ tm ) ((cov)- 1 ) (Fj exp) - i> et)(fc) ) . (52) 

dat /,J=l 13 

Note that we always include a factor -jJ— in the definition of the y 2 . Also, note that 
^(cov)" 1 ^ is the standard covariance matrix, which differs from the to~covariance matrix 

Eq. ([1]) because of the replacement of F\ ■ ,Fj with the measured values Fj,Fj in the 
second term on the right-hand side. 

Therefore, we define the correlation coefficient as follows: considering for definiteness 
the large-x preprocessing exponent of the singlet PDF T,(x,Q 2 ), we have 

This provides the variation 8x 2 as the large-x exponent <5ms is varied around its mean 
value. The correlations we find are very weak as shown in the last two columns of Ta- 
ble [71 It is clear that the x 2 ^ f° r the individual replicas is only marginally affected. 
This validates quantitatively the stability of our results with respect to the preprocessing 
exponents. 

4.5 Positivity constraints 

General theoretical constraints can be imposed during the minimization procedure, thereby 
guaranteeing that the fitting procedure only explores the subspace of acceptable physical 
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solutions: for example, the valence and momentum sum rules are enforced in this way [3J. 
An important theoretical constraint is the positivity of physical cross-sections. As dis- 
cussed in Ref. [UD], positivity should be imposed on observable hadronic cross-sections 
and not on partonic quantities, which do not necessarily satisfy this constraint. 

As in Ref. [3J, positivity constraints on relevant physical observables have been imposed 
during the genetic algorithm minimization using a Lagrange multiplier, which strongly 
penalizes those PDF configurations which lead to negative observables. In particular, we 
impose positivity of Fl(x, Q 2 ), which constrains the gluon and the singlet PDFs at small- 
x, as well as that of the dimuon cross section d 2 a u,c / dxdy [6j, which constrains the strange 
PDFs. Positivity should hold for any physical cross section which may be measured in 
principle. In practice, most PDFs are already well constrained by actual data, so that 
positivity is only relevant for PDFs such as the gluon and the strange distributions which 
are poorly constrained by the data. 

Due to the positivity constraints, the minimized error function Eq. (|49j) (or Eq. (|50|) 
in the weighted training epoch) is modified as follows 

^dat.p OS 

-A,** £ 9 et)(fc) ) i> ot)(fc) , (54) 

1=1 

where Adat.pos is the number of pseudodata points used to implement the positivity con- 
straints and we choose A pos ~ 10 10 as its associate Lagrange multiplier. Positivity of 
Fl(x,Q 2 ) is implemented in the range 10 -9 < x < 0.005 and that of the dimuon cross 
section in 10 -9 < x < 0.5, in both cases at the initial evolution scale Q 2 = 2 GeV 2 . This 
is done because if positivity is enforced at low scales, it will be preserved by DGLAP 
evolution. 

The impact of the positivity constraints on the NNPDF2.0 PDF determination will be 
quantified in Sect. 15.51 

4.6 Determination of the optimal fit 

We now turn to the formulation of the stopping criterion, which is designed to stop the 
fit at the point where the fit reproduces the information contained in the data but not its 
statistical fluctuations. The stopping criterion is applied on the training of each replica, 
and it is based on the cross-validation method, widely used in the context of neural network 
training |57j . Its application to our case has been described in detail in Refs. [3,4J, so here 
will mainly focus on the modifications introduced for NNPDF2.0. 

As discussed in the previous section, dynamical stopping is activated after N^ n gen- 
erations of targeted weighted training. Then, the weighted training on sets is switched off 
and minimization is done using Eq. (I49p evaluated with the error function based on equally 
weighted experiments. The dynamical stopping criterion is only activated if a number of 
prior conditions are fulfilled. We first require that all experiments have an error function 
below some reasonable threshold -Ethres- Then, it is necessary that a moving average over 
the error function for the training and validation sets satisfy 

r tr > 1 - <5 tr ; r vai > 1 + 5 va x, (55) 
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N 

1 ' smear 


^smear 




<5val 


-E'thres 


1 v gen 


200 


200 


io- 4 


310~ 4 


6 


30000 



Table 8: Parameter values for the stopping criterion. 



where 

n , s <M» , (56) 
f-f .. . (57) 

\-f- / valV^ ^smearj/ 

where the smeared error functions are given by 

1 * 

(£ t r,val(;)} = Tt Yl ^tr,val(0, (58) 

■L "smear ,_. . 

with -Etrvai(0 being the figure of merit Eq. (|49|) restricted to the training and validation 
sets for the genetic algorithms generation I. 

The values of the stopping parameters <5tr and <5 va i must be determined by analyzing 
the behaviour of the fit for the particular dataset which is being used for neural network 
training. As an illustration of how this is done in practice, we show in Fig. [8] the averaged 
training and validation E tv / va \ ratios Eqs. (|56H57p for a given replica and different values 
of the smearing length N sraear . For this particular replica the training has been artificially 

prolonged beyond its stopping point. From Fig. [8] it is apparent that while the training 

(k) 

ratio satisfies r tr < 1 always, i.e. that E^ continues to decrease, after a given number 
of generations we have r va i > 1, which then oscillates above and below 1: this is the sign 
that we have entered an 'overlearning' regime and minimization needs to be stopped. 

The optimal values of the stopping parameters are chosen to be small enough that 
overlearning is avoided, but large enough that the fit does not stop on statistical fluctu- 
ations. The latter condition can be met only if the value of N smear is large enough, but 
if N smear is too large stopping becomes very difficult and the first condition cannot be 
met. In practice, we have produced a set of 100 replicas with very long training, and for 
each value of N sraeai we have tried out a range of values of 5t r and <5 va i, until an optimal 
set of values which satisfies all the above criteria has been found. The final values of the 
parameters determined in this way are listed in Table In order to avoid unacceptably 
long fits, when a very large number of iterations is reached (see Table [6]) training is 

stopped anyway. This leads to a small loss of accuracy of the corresponding fits which is 
acceptable provided it only happens for a small fraction of replicas. 

In order to check the consistency of the whole procedure, we have produced a set of 
100 replicas from a fit with the same settings as the final reference fit but with no stopping 
and a large maximum number of genetic algorithm generations N™%^ = 50000. This set 
of 100 replicas allows us thus to verify that the targeted weighted training and stopping 

(k) 

criterion do not bias the fitting procedure, in that the values of E- do not drift away 
from the target values £^ arg when the weighted training is switched off, and also that the 
stopping criterion does not introduce underlearning by stopping the fit at a time when the 
quality of the fit is still improving. These conclusions are borne out, and in fact, in these 
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fits for many experiments and replicas the value of E - changes very little after the target 
values £'* arg are reached — indeed, the target values were obtained from a very long fit 
in the first place. Indeed, the average x 2 f° r this fit is only marginally better than that 
of the reference fit. However, some experiments do show signs of overlearning, with an 
accordingly lower value of the contribution to the \ 2 ■ 

This is illustrated in Fig. [91 where we show the E\ profiles for two particular ex- 
periments (E605 and NMC-pd) and replicas taken from this fit without stopping. In the 
first training epoch, in which the weighted training Eq. (150p is activated, one can see 
oscillations, but the downwards trend is clearly visible. Once targeted weighted training 
is switched off, minimization proceeds smoothly, and we see in the two cases that after a 
given number of genetic algorithms generations we enter in overlearning. For the two ex- 
periments the typical overlearning behaviour, characterized by the fact that the validation 

(k) (k) 

E\ T is rising while the training EiJ is still decreasing, sets in at about 15000 generations. 
This is the point where dynamical stopping avoids overlearning. 
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Figure 8: The training (upper plot) and validation (lower plot) ratios Eqs. (|561l57|) for a particular 
replica, as a function of the number of genetic algorithms generations, for various choices of the 
smearing parameter iV sm ear = A smcar . The value N smca ,r — A smoar = 200 is used in the reference 
fit (see Table [SJ. 
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Figure 9: Two typical examples of overlearning behaviour, extracted from a fit with the same 
settings as the final reference fit but with no stopping and a large maximum number of genetic algo- 
rithm generations N™f* = 50000. The upper plot shows the overlearning of the E605 experiment 
observed in one particular replica, and the lower plot corresponds to the NMC-pd experiment. 
Note that in these fits weighted training is switched off at N™* n = 10000. 
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2 

Xtot 


1.21 


(E) ± ^ 


2.32 ± 0.10 


It? \ _i_ _ 


o on i nil 

2.29 ± 0.11 


(-Sval) ± 0"S val 


2.35 ± 0.12 


(TL) ± CTTL 


16175 ± 6257 




1.29 ± 0.09 


K xpj >dat(%) 
<* (a *V (%) 


11.4 
6.0 


\^ s /dat 
\^ /Hat 


0.18 
0.54 



Table 9: Table of statistical estimators for NNPDF2.0 with 7V rcp = 1000 replicas. The total 
average uncertainty is given in percentage. 

5 Results 

In this section we present the NNPDF2.0 parton determination. First we discuss the 
statistical features of the fit, then we turn to a comparison of NNPDF2.0 PDFs and un- 
certainties with other PDF determinations and with previous NNPDF releases. Next we 
turn to a study of potential deviations from gaussian behaviour in PDF uncertainty bands. 
A detailed comparison between NNPDF2.0 and NNPDF1.2 follows, in which the impact of 
each of the differences between these fits is studied in turn: improved neural network train- 
ing, treatment of normalization uncertainties, impact of the combined HERA-I dataset, 
impact of the inclusion of jet and Drell-Yan data. Finally we discuss the impact of the 
positivity constraints in the PDF determination, and study the sensitivity of NNPDF2.0 
to variations in the value of the strong coupling a s . 

Note that while results for the NNPDF2.0 fit are obtained with N Iep = 1000 replicas, 
those for all other comparisons performed here are done with A^ rep = 100 replicas. 

5.1 NNPDF2.0: statistical features 

The statistical features of the NNPDF2.0 analysis are summarized in Tables[9](for the total 
dataset) and [10] (for individual experiments). Note that E^ Eq. (jiUj) and x 2 ^ Eq. ([52]) 
differ both because in the former each PDF replica is compared to the data replica it is 
fitted to, while in the latter it is compared to the actual data, and also because of the 
different treatment of normalization uncertainties as discussed after Eq. (|52p . The value of 

Xtot then refers to the average over replicas (best fit PDF set), while the value / Xto?) ^ s ^ ne 

average (and associate standard deviation) of x 2 ^ computed for each replica. The average 
training length {TL) (expressed as a number of generations of the genetic algorithm) is 
also given in this table. 

The distribution of x 2 ^ Eq. ()52|) . E^ Eq. (|49p and training lengths among the N rep = 
1000 replicas are shown in Fig. [10] and Fig. [TT] respectively. While most of the replicas 
fulfill the stopping criterion, a small fraction (~ 12%) of them stop at the maximum 
training length A^™^ x which, as discussed in Sect. 14.61 has been introduced in order to 
avoid unacceptably long fits. This causes some loss of accuracy in outliying fits, but we 
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Experiment 


9 

x 2 


(E) 






<P (CXP) )^ 




NMC-pd 


0.99 


2.05 


1.8 


0.5 


0.03 


0.36 


NMC 


1.69 


2.79 


4.9 


1.7 


0.16 


0.77 


SLAC 


1.34 


2.42 


4.2 


1.9 


0.31 


0.84 


BCDMS 


1.27 


2.40 


5.7 


2.6 


0.47 


0.55 


HERAI-AV 


1.14 


2.25 


7.5 


1.3 


0.06 


0.44 


CHORUS 


1.18 


2.32 


14.8 


12.8 


0.09 


0.38 


FLH108 


1.49 


2.51 


71.9 


3.3 


0.65 


0.68 


NTVDMN 


0.67 


1.90 


21.1 


14.6 


0.03 


0.63 


ZEUS-H2 


1.51 


2.66 


13.6 


1.2 


0.29 


0.58 


DYE605 


0.88 


1.85 


22.6 


8.3 


0.47 


0.75 


DYE866 


1.28 


2.35 


20.8 


9.1 


0.20 


0.45 


CDFWASY 


1.85 


3.09 


6.0 


4.3 


0.52 


0.72 


CDFZRAP 


2.02 


2.96 


11.5 


3.5 


0.83 


0.65 


DOZRAP 


0.57 


1.65 


10.2 


3.0 


0.53 


0.69 


CDFR2KT 


0.80 


2.22 


23.0 


5.2 


0.78 


0.67 


D0R2CON 


0.93 


1.92 


16.2 


6.0 


0.78 


0.64 



Table 10: Same as Table |H] for individual individual experiments. Note that experimental uncer- 
tainties are always given in percentage. 



have checked that as -/V™^ x is raised more and more of these replicas would eventually 
stop, and that the loss of accuracy due to this choice of value of N™ x is actually very 
small. 

The features of the fit can be summarized as follows: 

• As in previous fits, the values of X^ot 

and (E) differ by about one unit, consistent 
with the expectation that the best fit correctly reproduces the underlying true be- 
haviour about which data fluctuate, with replicas further fluctuating about data. 
Interestingly, much of the replica fluctuation is already removed by neural network 
training, i.e. when going from (E) to (x 2 ^)) with only a further small amount of 
statistical fluctuation being removed when averaging over replicas to get the best- 
fit Xtot- This reduction was already present in NNPDF1.2 (see the first column of 
Tab. 1111 below), where however both (x 2 ^) an d (E) differed rather more from the 
best fit Xtot an d from each other. The improvement shows that the training and 
stopping algorithm used here and described in Sect. [Hare more efficient. 

• The quality of the fit as measured by its Xtot = 1-21 has improved in comparison to 
NNPDF1.2 [6] despite the widening of the dataset to also include hadronic data. As 
we will discuss in greater detail in Sect. 15.41 below (see in particular Tab. [PL]) this 
improvement is largely due to the improvement in training and stopping, and to a 
lesser extent to the improved treatment of normalization uncertainties. The inclusion 
of the very precise combined HERA data then leads to a small deterioration in fit 
quality (possibly because of the lack of inclusion of charm mass effects near charm 
threshold), while the jet and DY data do not lead to any further deterioration. This 
X 2 value has very low gaussian probability and it is thus quite unlikely as a statistical 
fluctuation: it suggests experimental uncertainties might be underestimated at the 
10% level, or that there might be theoretical uncertainties of the same order. This 
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Figure 10: Distribution of X 2(k) Eq. ^ (left) and E^> Eq. @J over the sample of AT rep = 1000 
replicas. 

Distribution of training lenghts | 
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Figure 11: Distribution of training lengths over the sample of N lep — 1000 replicas. 



appears consistent with the expected accuracy of a NLO treatment of QCD, and the 
typical accuracy with which experimental uncertainties are estimated. 

• The histogram of x 2 values for each experimental dataset is shown in Fig. [T2l where 
the unweighted average (x 2 ) sets = 7\r~r ^ f=i ^sct j an d standard deviation over 
datasets are also shown. We see no evidence of any specific dataset being clearly 
inconsistent with the other, and the distribution of values looks broadly consistent 
with statistical expectations, with about five datasets with x 2 at more than one but 
less than two sigma from the average. Also, we see no obvious difference or tension 
between hadronic and DIS datasets. Clearly, the x 2 values for some experiments if 
taken at face value have low gaussian probabilities (though only one, namely NMC, 
has a probability less than 0.01%). However, they appear to be stable upon the 
inclusion of new data, thus suggesting a lack of tension between different datasets. 
For instance, the x 2 value of the NMC data is very close to that of Refs. [U|2]: this 
value thus appears to reflect the internal consistency of these data, not their consis- 
tency with other data. Some of the issues with specific datasets will be discussed in 
somewhat greater detail in this section below, while the behaviour of the fit quality 
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as more data are included in the fit will be discussed in detail in Sect. 15.41 where 
strong evidence for the lack of tension between datasets will be presented. 

• As in previous NNPDF determinations, the uncertainty of the fit, as measured by 
the average standard deviation (a) is rather smaller than that of the data: 6.0% vs. 
11.4%. The uncertainty reduction shows that the PDF determination is combining 
the information contained in the data into a determination of an underlying physical 
law. As one would expect the greatest reduction is observed in HERA DIS data, 
but sizable reductions are also seen in Drell-Yan and jet data, thus confirming the 
consistency of these data with the global dataset. 
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Figure 12: Values of the x 2 (or more properly the \ 2 P er data point - see Eq.(52)) for the datasets 
included in the NNPDF2.0 reference fit, listed in Table ITOl The horizontal line corresponds to the 
unweighted average of these \ 2 over the datasets and and the black dashed line to the one-sigma 
interval about it: (x 2 ) SGts — 1-06, a x 2 = 0.40; DIS and hadronic datasets are grouped respectively 
to the left and right of the histogram and distinguished by different colors. 



Let us now consider in greater detail the quality of the fit for some specific experiments 
whose x 2 differs by more than one sigma from the average: 

• The high value of the x 2 of the NMC Ff data has been observed in all our previous 
PDF determinations. It should be observed that, as already mentioned, it was first 
observed in Refs. [IJ[2], where a parametrization of the structure function F^x, Q 2 ) 
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was constructed without using either PDFs or QCD: hence, this value simply reflects 
the fact that the data within this set are not consistent with each other, i.e. they 
show point-by-point fluctuations which are wider than allowed by their declared 
uncertainty. 

• For dimuon data x 2 ~ 0.65, as was also the case in NNPDF1.2 [6]. As discussed there 
in detail, this stems from the fact that statistical and systematic uncertainties are 
added in quadrature for this dataset: the dominant statistical uncertainty is affected 
by a bin by bin correlation due to the unfolding procedure used in extracting the 
dimuon cross section from the measured observable, but the corresponding covariance 
matrix is not available. 

• The x 2 °f the HERA-I combined data is x 2 = 1-14, somewhat larger than the value 
found when fitting the separate ZEUS and HI data. The value comes from averaging 
the relatively large x 2 ~ 1-3 for the very precise NC positron dataset, with a low 
value x 2 ~ 0-6 for CC electron data. The reasons for this distribution of values 
are unclear, however, we note that also in NNPDF1.2 [6] the x 2 °f the CC datasets 
was typically smaller than the average as well. We note also that the same pattern 
of x 2 among the different datasets has been obtained within the framework of the 
HERAPDF1.0 analysis of these combined HERA-I dataset [121161"] . 

• The CDF direct W— asymmetry measurements have x 2 = 1-85. The poor compati- 
bility of these data with the rest of the global fit data was also noted in the global 
analysis of Refs. [62|[63j . 

• The quality of the fit to Z rapidity distribution data at the Tevatron differs widely 
between experiments: while an excellent fit is obtained for DO data, CDF data 
are not so well described. This suggests that there might be problem of internal 
consistency between the two experiments. A similar pattern was observed in the 
MSTW08 global fit [UJ. Note that these datasets have a very moderate impact on 
the global fit, as proven by the fact that (see Sect. 15.41 below, in particular Table [TT]) 
the x 2 °f these data is essentially the same in NNPDF2.0 and in NNPDF1.2 (where 
they are not fitted). 

Finally, we have checked that if we run a very long fit without dynamical stopping, the 
X 2 of the experiments whose values exceed the average by more than one sigma does not 
improve significantly. This shows that the deviation of these x 2 values from the average 
is not due to underlearning. 

5.2 Parton distributions 

The NNPDF2.0 PDFs are compared to the previous NNPDF1.0 [4] and NNPDF1.2 [6] 
parton sets in Figs. [1314161 All PDF combinations are defined as in Refs. [HE]. Note 
that all uncertainty bands shown are one-sigma; the relation to 68% confidence levels will 
be discussed in Sect. 15.31 below. The consistency between subsequent NNPDF releases, 
extensively discussed in previous work [HE] is apparent. Also apparent is the reduction 
in uncertainty obtained going from NNPDF1.2 to NNPDF2.0; the causes for this im- 
provement will be discussed in detail in Sect. 15.41 below. In order to further quantify the 
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differences between the NNPDF2.0 and NNPDF1.2 parton sets, the distance (as defined 
in Appendix [AJ between these sets are shown in Fig [17] as a function of x: all PDFs for 
all x are consistent at the 90% confidence level, and in fact almost all are consistent to 
within one sigma. 

The NNPDF2.0 PDFs are also compared to CTEQ6.6 [ID] and MSTW08 [UJ PDFs in 
Figs.QIHUJ Most NNPDF2.0 uncertainties are comparable to the CTEQ6.6 and MSTW08 
ones; there are however some interesting exceptions. The uncertainty on strangeness, 
which NNPDF2.0 parametrizes with as many parameters as any other PDF, is rather 
larger than those of MSTW08 and CTEQ6.6, in which these PDFs are parametrized with 
a very small number of parameters. The NNPDF2.0 uncertainty on total quark singlet 
(which contains a sizable strange contribution) is also larger. The uncertainty on the 
small x gluon is significantly larger than that found by CTEQ6.6, but comparable to that 
MSTW08, which has an extra parameter to describe the small x gluon in comparison to 
CTEQ6.6. The uncertainty on the triplet combination is rather smaller in NNPDF2.0 
than either MSTW08 or CTEQ6.6. As we shall see in Sect. 15.41 this small uncertainty is 
largely due to the impact of Drell-Yan data (which are found to be completely consistent 
with DIS data within our NLO treatment): hence, the fact that we find it to be smaller 
than MSTW08 or CTEQ6.6 does not appear to be due to the choice of dataset. 



44 




Figure 13: The singlet S = + <Zi)i ghion g and total strangeness s + — s + s at the initial 

scale Qq = 2 GeV 2 from the NNPDF2.0 analysis both on linear (left) and logarithmic (right) scale, 
compared to the previous NNPDF releases NNPDF1.0 [4! and NNPDF 1.2 [6]. 
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Figure 14: Same as Fig. [131 for the triplet T 3 = u + u- d — d, total valence V = — sea 

asymmetry A5 = d — u and strangeness asymmetry = s — s. 
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Figure 15: Absolute uncertainties on the PDFs of Fig. fl3l 
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Figure 16: Absolute uncertainties on the PDFs of Fig. [141 
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Figure 17: Distance between the NNPDF2.0 and the NNPDF1.2 parton sets (central values and 
uncertainties) for all PDFs as a function of x. All distances are computed from sets of N lcp = 100 
replicas (see Appendix \M) 
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Figure 18: Same as Fig. EH but compared to MSTW08 [TT] and CTEQ6.6 [10] PDFs. 
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Figure 19: Same as Fig. El but compared to MSTW08 Q3] and CTEQ6.6 [10 j PDFs. 
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Figure 21: Same as Fig.QH but compared to MSTW08 Q3] and CTEQ6.6 [10 j PDFs. 
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5.3 Confidence levels 

An important advantage of the Monte Carlo method used in the NNPDF approach to 
determine PDF uncertainties is that, unlike in a Hessian approach, one does not have 
to rely on linear error propagation. It is then possible to test the implication of a non- 
gaussian distribution of experimental data which were found in Ref. [M] to be minor; and 
and also to test for non-gaussian distribution of the fitted PDFs even though our starting 
data and data replicas are gaussianly distributed. 

A simple way to test for non-gaussian behaviour for some quantity is to compute a 68% 
confidence level for it (which is straightforwardly done in a Monte Carlo approach), and 
compare the result to the standard deviation. This method was used in Ref. [6] to identify 
large departures from gaussian behaviour in the strange over non-strange momentum 
ratio. In Fig. [22] this comparison is shown for all NNPDF2.0 PDFs at the initial scale as 
a function of x. 

Figure [22] shows that in the regions in which the PDFs are constrained by experimental 
data the standard deviation and the 68% confidence levels coincide to good approxima- 
tion, thus suggesting gaussian behaviour. However, in the extrapolation region for most 
PDFs deviations from gaussian behaviour are sizable. This is especially noticeable for the 
gluon at small x, and for the quark singlet and total strangeness both at small and large 
x. Deviations from gaussian behaviour are sometimes related to positivity constraints 
Eq. (|54|) : for instance positivity of Fl and the dimuon cross-section limits the possibility 
for the small-x gluon and strange sea PDFs respectively to go negative, thereby leading 
to an asymmetric uncertainty band. The impact of positivity constraints on PDFs will be 
discussed in greater detail in Sect. 15.51 
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Figure 22: Comparison of 68% confidence level and one-sigma intervals for NNPDF2.0 PDFs at 
the initial scale. 
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Table 11: Statistical estimators for the sequence of fits that take from NNPDF1.2 to NNPDF2.0. 
The estimators shown for NNPDF1.2 are as in Tab. 5-6 of Ref. [6] and those for NNPDF2.0 are 
as in Tab. IMlQl Estimators are shown for the total datasets in the upper part of the table, while 
the lower part of the table shows the \ 2 for each individual experimental dataset. Values of the x 2 
for data not included in any given fit are shown in italic; the total Xtot shown in the first line does 
not include the contribution from these data. The value of the \ 2 m the HERAI line refers in the 
first three columns of the table to the weighted sum of the HI and ZEUS data, and in the latter 
three columns to the combined dataset, according to which data has been included in the fit. 

5.4 Detailed comparison to NNPDF1.2: methodology and dataset 

As seen in Sect.[0]the quality of the NNPDF2.0 fit is rather better than that of NNPDF1.2, 
despite the wider dataset. We now perform a detailed comparison of these two fits, which 
differ both in procedural aspects and in dataset. In order to elucidate the impact on the 
fit of each of these, we have produced a sequence of PDF determinations that take us from 
NNPDF1.2 to NNPDF2.0 by varying one by one each of the procedural aspects, then each 
of the datasets inclusions, as follows 

(i) we start from NNPDF1.2; 

(ii) we switch to the improved genetic algorithm and minimization of Sect. [5] (IGA); 

(iii) we introduce the improved treatment of normalization uncertainties of Ref. [H] 
(to method); 

(iv) we replace the separate HI and ZEUS data with the new combined HERA-I 
dataset: this gives the NNPDF2.0 set, but with DIS data only (2.0-DIS); 

(v) we add jet data (2.0-DIS+jet); 

(vi) we add the DY data, thereby obtaining the NNPDF2.0 fit. 

The statistical estimators for this sequence of fits are shown in Table [TT] (including 
the NNPDF2.0 estimators already shown in Tab. [9 Hl0p . We will now discuss each of 
these subsequent fits in turn by examining its general features, and determining and 
understanding the distance (as defined in Appendix |A} between PDFs obtained in each 
pair of subsequent fits. 
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Figure 23: Distance between the NNPDF1.2 fit and a fit to the same data but improved genetic 
algorithm and stopping (IGA). 



1. Effect of the improved genetic algorithm and stopping criterion (IGA). 

The improvement in neural network training leads to a significant improvement in 
fit quality: each replica fits better the corresponding data replica (lower (E)), and 
also each replica neural network is more efficient in subtracting the statistical noise 
from data (lower (x 2 ^)), thereby leading to a better global fit (lower Xtot)- The 
improvement is due to the improvement in fit quality of fixed-target DIS experiments 
(NMC, BCDMS and CHORUS) which probe the valence region which has more 
structure, and which moreover are known [l,2,65j to have a certain amount of data 
inconsistency, without change in fit quality for other experiments: this means that 
the new algorithm is more efficient in leading to a balanced fit quality between 
experiments, without some data being under learnt while others are overlearnt. 

The distance between NNPDF 1.2 and this fit, which only differs from it because 
of the IGA, is shown in Fig. [23l the IGA affects essentially all PDFs by reducing 
their uncertainties, the two fits are always consistent at the 1-a level. The individual 
PDFs which are more affected are the triplet, the valence and the gluon at small-x, 
which are shown Fig. [23J 

2. Impact of the treatment of normalization uncertainties. 

The IGA fit is now repeated by also using the improved to method of Ref. [18] for 
the treatment of normalization uncertainties. This leads to a further small but not 
negligible improvement in fit quality, mostly due to the fixed-target DIS experiments 
which have largest normalization uncertainties. The distances between the two fits, 
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Figure 24: Comparison between PDFs from the NNPDF1.2 fit and a fit to the same data but 
improved genetic algorithm and stopping (IGA) (the distances are shown in Fig. [23]): small- a; 
gluon, valence and triplet (from left to right). 



which only differ in the treatment of normalizations, are shown in Fig. [251 The PDFs 
which are most affected are the small-a; singlet and gluon and the triplet. A more 
detailed discussion of the impact of the treatment of normalization uncertainties on 
fits to the NNPDF1.2 dataset was presented in Ref. [18J and will not be repeated 
here. 

3. Impact of the combined HERA-I data. 

The previous IGA+io fit is now repeated replacing the ZEUS and HI data with 
the new combined HERA-I dataset of Ref. [12]. This fit is now identical to the 
NNPDF2.0 fit, but with only DIS data (i.e. no hadronic data) included (2.0-DIS). 
The inclusion of the very precise HERA-I data leads to a slight deterioration of fit 
quality, which remains however still better than that of NNPDF1.2. This deterio- 
ration is concentrated in the HERA data themselves, with the quality of the fit to 
all other data unchanged. This suggests good consistency of the HERA and fixed 
target data, but with the accuracy of the combined HERA-I data now exceeding the 
accuracy of the theory used to describe them in NNPDF2.0: specifically, the lack of 
inclusion of charm mass corrections, but also possibly deviations from NLO DGLAP 
at small x [66], or possible evidence for NNLO corrections at larger x. A partic- 
ularly interesting aspect of this fit is that the quality of the fit to Drell-Yan data 
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Figure 25: Distance between the IGA fit of Fig. [23] and a fit with improved treatment of normal- 
ization uncertainties (IGA+io)- 



(not fitted), which was poor in all previous fits, improves considerably, especially for 
the W asymmetry. This suggests that the accuracy of the charged-current data in 
the HERA-I combined set is now sufficient to provide some handle on the flavour 
decomposition of the sea at large x which is only weakly constrained by neutral 
current DIS data, and strongly constrained by DY data. 

The distances between these fits is shown in Fig. [27) the impact of the combined 
HERA data is a moderate but generalized improvement in accuracy at small x. The 
effect on the singlet and the gluon at small- x is shown in Fig. [28l The sizable error 
reduction in the small x singlet is specially interesting. 

4. Impact of jet data. 

The addition of jet data to the 2.0-DIS fit leaves the quality of the global fit un- 
changed. This demonstrates the perfect compatibility of jet data with DIS data: in 
fact, the quality of the fit to jet data was quite good even in all previous fits, in which 
they were not included in the fitted dataset. The distance between the 2.0-DIS and 
2.0-DIS+JET fits, displayed in Fig. [29l shows that these data affect almost only the 
gluon, as one would expect [50], leading to a better determination of it at medium 
and large x. This is shown in Fig. [30"1 where the gluons of 2.0-DIS and 2.0-DIS+JET 
are compared. 

5. Impact of Drell-Yan data. 

The addition of Drell-Yan data to the 2.0-DIS+JET fit leaves the quality of the global 
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Figure 26: Comparison between PDFs from the IGA fit of Fig. [53] and a fit with improved 
treatment of normalization uncertainties (IGA+io) (the distances are shown in Fig. [25]): small- a: 
gluon, small x singlet and triplet (from left to right). 



fit unchanged. Taken together with the previous comparison of the 2.0-DIS and 2.0- 
DIS+JET data, this shows that DIS data and hadronic data are fully compatible, 
and furthermore the two classes of hadronic data included here, DY and inclusive 
jets, are compatible with each other. Minor incompatibilities only appear within 
each dataset (typically due to some subset of data points or, in the case of Drell-Yan 
to the CDF W asymmetry and Z rapidity distribution data). However, the quality 
of the fit to Drell-Yan data was generally poor when they were not included in the 
fit, due to the fact that they are sensitive to the separation of individual flavours at 
large x which is only very weakly constrained by other data. 

The distances between the 2.0-DIS+JET and the full NNPDF2.0 fits, displayed in 
Fig. [3TT show the sizable impact of the Drell-Yan data on all valence-like PDF 
combinations at medium and large- x: the triplet, the valence, the sea asymmetry 
and the strangeness asymmetry. The significant improvement in accuracy on all 
these PDFs is apparent in Fig. [30] The remarkable improvement in the accuracy of 
the determination of the strangeness asymmetry s~{x) will turn out to have relevant 
phenomenological implications for the so-called NuTeV anomaly, as we discuss in 
Sect. E] 
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Figure 27: Distance between the IGA+io fit of Fig. [25] and a fit in which the separate HI and 
ZEUS data are replaced by the combined HERA-I DIS data (NNPDF2.0 DIS). 



Finally, we have produced two further PDF sets: one with the full NPDF2.0 dataset, 
but with HERA-I combined DIS data replaced by the previous separate HI and ZEUS 
data; and the other with DIS+DY data only. In both cases, we see that the impact of 
the new data is independent of the dataset to which they are added: so for instance the 
improvement in accuracy in the valence sector due to DY data is independent of their 
being added to a dataset that does or does not contain jet data. 

The main conclusion of this analysis is that we see no sign of tension between datasets. 
To understand this, consider what would happen if, say, jet data were incompatible with 
Drell-Yan data: then, we should see a daterioration of the quality of the fit to Drell-Yan 
when jets are included, and also we should see that the impact of jet data is bigger when 
Drell-Yan data are not included and more moderate when they are included. None of these 
effects is observed, for any of the combinations that have been tried here. Deterioration of 
the fit quality to each individual data set upon global fitting has been discussed in detailed 
in Ref. [65J: whereas small data incompatibilities may only be revealed by the more 
sensitive method used in this reference, we see no evidence for the sizable incompatibilties 
found there. 
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Figure 28: Comparison between PDFs from IGA+irj fit of Fig. [55] and a fit in which the separate 
HI and ZEUS data are replaced by the combined HERA-I DIS data (NNPDF2.0 DIS) (the distances 
are shown in Fig. l27j): small- x gluon and small x singlet (from left to right). 
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Figure 29: Distance between the NNPDF2.0 DIS fit of Fig. [27] and a fit in which jet data are also 
included (NNPDF2.0 DIS+JET). 
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Figure 30: Comparison between PDFs from NNPDF2.0 DIS fit of Fig. CUJ and a fit in which jet 
data are also included (NNPDF2.0 DIS+JET) (the distances are shown in Fig. [29j): the gluon at 
small and large x (from left to right). 
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Figure 31: Distance between the NNPDF2.0 DIS+JET fit of Fig.[29]and the reference NNPDF2.0 
fit (Drell-Yan data also included). 
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Figure 32: Comparison between PDFs the NNPDF2.0 DIS+JET fit of Fig. [29] and the reference 
NNPDF2.0 fit (Drell-Yan data also included) (the distances are shown in Fig. [31]): triplet, valence, 
sea asymmetry and strange valence (from left to right and from top to bottom). 
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5.5 Positivity constraints 

As discussed in Sect. HI positivity of physical observables has been imposed, in particular 
for the longitudinal structure function Fl{x, Q 2 ) and for the dimuon cross section through 
a Lagrange multiplier Eq. (|54p . In order to assess quantitatively the effect of the positivity 
constraints, we have repeated the NNPDF2.0 parton determination without imposing 
positivity, i.e. setting A pos = in Eq. (f54"|) . 

In Fig. [33] PDFs with uncertainties determined as 68% confidence levels with and 
without positivity constraints are compared. As discussed in Sect. 15.31 it is important 
to perform the comparison with uncertainties determined as confidence levels rather than 
standard deviations, because imposing positivity can lead to deviations from gaussian 
behaviour. Clearly positivity of Fl(x,Q 2 ) leads to substantial uncertainty reduction in 
the small- x gluon. Note that there is nevertheless a kinematic region in which the gluon 
goes negative by a small amount, though Fl remains positive. Also, removing positivity 
of the dimuon cross section would lead to a much softer strange sea at small- x with 
rather larger uncertainties. This in turn leads to a softer small- x singlet, also with larger 
uncertainties. This is due to the fact that below x < 0.01, where no neutrino data are 
available, positivity is the only constraint on the total strangeness s + . 

Finally, it is interesting to observe that positivity also has the effect of stabilizing the 
replica sample: indeed, the 68% confidence levels computed without positivity display 
some visible fluctuations which would only be smoothened out by using a significantly 
wider replica sample. These fluctuations are absent when positivity is imposed, meaning 
that such wide fluctuations in individual replicas are removed by the constraint. 
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Figure 33: NNPDF2.0 PDFs with and without positivity constraints: singlet, gluon and total 
strangeness at small x and total strangeness at large x. All uncertainty bands are determined as 
68% confidence levels. PDFs not shown here are not affected by the positivity constraints. 
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5.6 Dependence on a s 

The central NNPDF2.0 fit has been performed with a s {Mz) = 0.119 in order to ease 
comparison with the previous NNPDF1.0 and NNPDF1.2 fits, even though the current [67] 
PDG average is a s (Mz) = 0.118 ± 0.002. In order to study the dependence of our results 
on this choice, we have repeated the fit with a s varied by one and two standard deviations 
about this value, i.e. we have produced PDF sets with a s (Mz) = 0.115, 0.117, 0.121 and 
0.123. 

In the previous NNPDF1.0 and NNPDF1.2 parton sets the dependence of PDFs on 
a s was found [H l68TfT0] to be noticeable but weak: when a s was varied by Aa s = ±0.002 
most PDFs were found to be statistically indistinguishable from those obtained with a s 
fixed to its central value (i.e. to be at a distance d ~ 1 from them). The gluon (and to a 
lesser extent the singlet PDF) was found to change in a statistically significant way, but 
still within its uncertainty band when a s was varied in this range. 

The dependence of NNPDF2.0 PDFs on a s is shown in Fig. [3^1 where the ratio of the 
four a s PDF sets to the central set are shown for all PDFs except the total strangeness 
s + which is found not to vary significantly. Clearly, all PDFs are still within the central 
uncertainty band when Aa s = ±0.002. However, there appears to be now somewhat 
greater sensitivity to a s . Firstly, now not only the gluon but also the triplet, singlet and 
valence, when a s is varied in the range Aa s = ±0.002, move close to the edge of the one-a 
range for the central PDF. This corresponds to a distance d ~ 7, well above the threshold 
of statistical significance, and even for the gluon it is a somewhat larger variation than 
observed in NNPDF1.2. Furthermore, the triplet, which as discussed in Sect. 15.21 is now 
determined very accurately, appears to be as sensitive as the gluon to the value of a s . The 
increased sensitivity of quark distributions to the value of a s is likely a consequence of the 
inclusion of Drell-Yan data, which undergo large NLO corrections and are thus sensitive 
to a s . 

This increased sensitivity with respect to a s suggests that the strong coupling could be 
determined from the global PDF analysis with competitive accuracy, following a procedure 
similar to that used to obtain the accurate determination of the CKM matrix element |V^ S | 
of Ref. [6]. 
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a s dependence in NNPDF2.0 
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Figure 34: Ratios of PDFs with a s varied in the range 0.115 < a s < 0.123 to the central 
NNPDF2.0 determination, compared to the PDF uncertainty band: the singlet at small and large 
x, the gluon at small and large x and the triplet, valence, sea asymmetry and strange valence (from 
top to bottom). 
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6 Phenomenological implications 



A full phenomenological study of the implications of NNPDF2.0 PDFs is beyond the scope 
of this paper. In this section we present some preliminary investigations: we compare to 
the experimental data which has been included in the fit, then we discuss the implications 
for the proton strangeness and in particular to the NuTeV anomaly, and finally we present 
predictions for some LHC standard candles. 



6.1 Comparison to experimental data 

The general quality of predictions obtained using NNPDF2.0 PDFs for the observables 
which have been included in the fits has already been summarized in Table [TU1 and Fig. PT2l 
A direct comparison of the data with theoretical predictions for some of these observables 
are shown in Fig. [35] (DIS and Drell-Yan) and Fig. [36] (inclusive jets). 

In Drell-Yan observables, the improvement in accuracy of the prediction when going 
from NNPDF1.2 to NNPDF2.0 is apparent: in particular, the sea asymmetry, virtually 
unconstrained from DIS, is now very well constrained by the E866 ratio data. Also the 
uncertainty reduction in the VF-asymmetry measurement shows the increase in the pre- 
cision of the determination of the quark decomposition in NNPDF2.0. In jet data, the 
excellent agreement between data and theory seen from the x 2 of Tab. [10] is seen to hold 
through the whole kinematical range for all bins in transverse momentum and rapidity. 



6.2 The proton strangeness revisited 

In Ref. [6] a detailed study of the strangeness content in the proton was performed, with 
particular emphasis on the precision determination of electroweak parameters. The addi- 
tion of fixed-target Drell-Yan data in the NNPDF2.0 PDF determination, together with 
other improvements in the fit that have been discussed in Sect. 15.41 leads to significantly 
stricter constraints on the shape of the strange distributions s^(x) PDFs, as shown in 
Figs. [T3114I while remaining consistent with the NNPDF1.2 result, the new determina- 
tion of s + and especially s~ at large x have a much reduced uncertainty. 

Indeed, the strange momentum fraction Ks = jj+ +d + and strangeness asymmetry 

R s = V - S + ~ D - [6j at Q 2 = 20 GeV 2 are 

fo.7lt^f tat ±0.26 s y st (NNPDF1.2) 
[0.503 ±0.075 stat ; (NNPDF2.0) 

Jo.006±0.045 stat ±0.010 syst (NNPDF1.2) 
\o.019±0.008 stat (NNPDF2.0), > ' ) 

i.e. the PDF uncertainty on Ks is reduced by more than a factor two, while that on Rs 
is reduced by a factor 5, with all results consistent within uncertainties. We have made 
no attempt to provide a new determination of systematic and theoretical uncertainties on 
Rs, which are now comparable to the reduced statistical uncertainties, but they should 
be similar to those determined in Ref. [6] and quoted in Eqs. (j59M60j) . 

The distribution of Ks values for 1000 NNPDF2.0 replicas is shown in Fig. [37] in 
comparison to the analogous plot in Ref. [6] the narrower distribution which we now get 
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Figure 35: Comparison between data and NLO predictions obtained using NNPDF1.0, 
NNPDF1.2 and NNPDF2.0 PDFs, for several DIS and Drell-Yan observables included in the 
NNPDF2.0 fit. From top to bottom and from left to right: the structure function and the 
F2/F2 (NMC), the inclusive neutrino cross-section (CHORUS), the Drell-Yan rapidity distribu- 
tion (E866p), the W— asymmetry (CDF) and the Drell-Yan p/d ratio (E866). For the purposes of 
this plot only, experimental statistical and systematic uncertainties have been added in quadrature. 



is closer to gaussian and indeed, unlike in Ref. [6], we now find no difference between the 
68% confidence level and (symmetric) one-cr intervals. 

The implication of the accurate determination Eq. (|60p of the strangeness asymmetry 
Rs for the so-called NuTeV anomaly [2D] are striking: in Fig. [3S]we compare the NuTeV 
determination of the Weinberg angle [19] . uncorrected or corrected for strangeness asym- 
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Figure 36: Comparison between data and NLO predictions obtained using NNPDF2.0 PDFs, 
for inclusive jet production from DO and CDF Run II. Data points are ordered in rapidity and 
in transverse momentum from left to right. Experimental statistical and systematic uncertainties 
have been added in quadrature for this plot. The NLO theoretical prediction has been obtained 
using the FastNLO code. 
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Figure 37: Distribution of Ks at Q 2 = 20 GeV 2 computed from the reference set of iV rC p = 
1000 NNPDF2.0 PDF replicas. The central region corresponds to the 68% confidence interval, 
K s (Q 2 = 20 GeV 2 ) = 0.503 ± 0.075 (stat), which coincides with the 1-a interval Eq.HSJ). 

metry as discussed in Ref. [6], using the values of Rs Eqs. ([60]) . and the result of a global 
electroweak fit [71]. The two corrected values, unlike the uncorrected NuTeV value, are 
in perfect agreement with the electroweak fit and with each other. However, while the 
uncertainty on the Weinberg angle with NNPDF1.2 correction was considerably larger, the 
uncertainty after NNPDF2.0 correction is comparable to that on the uncorrected value. 
Indeed, Eq. (160p provide a 1-a evidence for a non-zero and positive strangeness asymmetry 
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Determinations of the weak mixing angle sin 2 w 
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Figure 38: Determination of the Weinberg angle from the uncorrected NuTeV data [19], with 
[S~] correction using NNPDF1.2 (Eq. ([60])) and NNPDF2.0 (Eq. ([60])) results, and from a global 
electroweak fit [71]. Note that that statistical uncertainties only are included in the NNPDF2.0 
correction. 



in the nucleoli. While such an asymmetry was previously advocated as a possible expla- 
nation of the NuTeV anomaly [20], evidence for it [HJ[T9][72][73] was so f ar inconclusive, 
and it is being established here for the first time. 

6.3 Parton luminosities 

In order to highlight the impact of parton distributions at LHC the parton-parton lumi- 
nosities (also called partonic fluxes) are relevant |74[l75j ; of particular interest are the sizes 
of PDF uncertainties in parton luminosities from different PDF sets. 

We can define three relevant combinations of PDF luminosities for the production of 
a massive object with mass Mx in hadronic collisions as follows: 




with r = M x / s and yfs the center of mass energy of the hadronic collision. 

In Fig. [321 we show the various partonic luminosities Eq. (|61[) at the LHC as computed 
with the NNPDF2.0 set. It is clear that at low masses the GG and GQ channels are both 
important, while at large masses the GQ channel dominates. Also in Fig. [39] we show the 
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PDF Luminosities at the LHC - NNPDF2.0 
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Figure 39: Parton-parton luminosities Eq. ([61j) in the various partonic channels, computed from 
the NNPDF2.0 set at the LHC for V? = 14 TeV (above) and ratio of results for 7 TeV and 14 TeV 
(below). 



ratio of partonic luminosities between LHC 14 TeV and 7 TeV. While at small masses the 
loss in partonic luminosity is roughly a factor two, it can be as large as a factor ten or 
more at large masses. The gluon-gluon luminosity is the channel which suffers the greatest 
reduction. Turning now to the uncertainties on parton luminosities due to PDFs, in Fig. 00] 
we compare the relative PDF uncertainties (normalized to the respective central set) in 
various channels of PDF luminosity for the NNPDF2.0, CTEQ6.6 and MSTW08 sets. In 
the GG channel, all PDF sets agree in the central mass region, and NNPDF2.0 is close 
to MSTW08 in general. In the QQ channel all PDF sets yield very similar uncertainties 
at small and medium masses. It is also clear from Fig. 00] that at 7 TeV the restricted 
x-range of the partons leads to sizably larger PDF uncertainties at large values of Mx- 
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Figure 40: Relative PDF uncertainties on parton-parton luminosities Eq. (|6Tj) for the NNPDF2.0, 
CTEQ6.6 and MSTW2008 PDF sets, as function of the mass of the produced heavy object M x 
at the LHC for 14 TeV (left) and 7 TeV (right). From top to bottom, the gluon-gluon luminosity, 
the gluon-quark luminosity and the quark-quark luminosity are shown. 



6.4 LHC standard candles 

The total cross sections at the LHC with y/s = 14 TeV for W, Z, H and tt produc- 
tion computed at NLO with MCFM [76H79] and NNPDF2.0, NNPDF1.2., CTEQ6.6, and 
MSTW08 PDFs are compared in Table [Eland Fig.HU Values obtained using NNPDF2.0 
are in excellent agreement with those from NNPDF1.2, with significantly smaller uncer- 
tainties. The predictions from previous NNPDF sets were discussed in [6]. 

It was already observed in Ref. [4J that NNPDF results for W and Z production 
agree with those of CTEQ6.1, but undershoot the CTEQ6.5 and CTEQ6.6 predictions by 
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<t(W+)Bt (W+ -> l+vi) 


<t(W-)Bt(W- -> l+vi) 


tr(£°)Br [Z u -> l+r) 


NNPDF1.2 


11.99 ± 0.34 nb 


8.47 ±0.21 nb 


1.94 ±0.04 nb 


NNPDF2.0 


11.57 ± 0.19 nb 


8.52 ± 0.14 nb 


1.93 ±0.03 nb 


CTEQ6.6 


12.41 ±0.28 nb 


9.11 ±0.22 nb 


2.07 ±0.05 nb 


MSTW08 


12.03 ± 0.22 nb 


9.09 ± 0.17 nb 


2.03 ±0.04 nb 





<r(tt) 


a(H,m H = 120 GeV) 


NNPDF1.2 


901 ± 21 pb 


36.6 ± 1.2 pb 


NNPDF2.0 


913 ±17 pb 


37.3 ±0.4 pb 


CTEQ6.6 


844 ± 17 pb 


36.3 ±0.9 pb 


MSTW08 


905 ± 18 pb 


38.4 ±0.5 pb 



Table 12: Cross sections for W, Z, tt and Higgs production at the LHC at y/s — 14 TeV. 
All quantities have been computed at NLO using MCFM [76-79] with default settings for the 
NNPDF1.2, NNPDF2.0, CTEQ6.6 and MSTW08 PDF sets. All uncertainties shown are one- 
sigma. The Higgs cross section corresponds to the gluon-gluon fusion production channel. 

more than 5%. The main difference between CTEQ6.5/CTEQ6.6 and CTEQ6.1 is that 
charm mass effects are included in the former pair of fits, but not in the latter, and are 
also not included in all available NNPDF fits. This suggests that charm mass effects be 
responsible for the discrepancy between the CTEQ6.6 and NNPDF predictions for W and 
Z cross sections. It should be noticed however that NNPDF1.0 results do agree [3] with 
MRST01 [80], and do not agree with MSTW08 (as it is clear from Table [12]) despite the 
fact that charm mass effects are included both in MRST01 and MSTW08. The pattern for 
Higgs and tt production is even less clear, with NNPDF in good agreement with MSTW08 
but not CTEQ6.6 for the former, and in good agreement with CTEQ6.6 but not MSTW08 
for the latter. 

Note however that most of these cross sections are quite sensitive to the value of a s , 
and some of them extremely sensitive: for example, the contribution to the Higgs cross 
section from gluon-gluon fusion varies by about 5% when a s is varied by 2%. The results 
shown in Table [121 and Fig.[4T]have been obtained with the default settings of MCFM, and 
in particular with the value of a s corresponding to each group's central parton fit, namely 
a s (M z ) = 0.118 for CTEQ6.6 and a s (M z ) = 0.120 for MSTW08 (and a s (M z ) = 0.119 
for NNPDF2.0). Hence, benchmarking of these cross sections with the same value of all 
parameters including a s should be performed before conclusions can be drawn. 

It should finally be noticed that some approximations used in the MSTW08 and 
CTEQ6.6 PDF determinations but not by NNPDF could have an impact on these ob- 
servables, such as the use of if-factors in fitting Drell-Yan data by both MSTW and 
CTEQ, the use of a restrictive small x parametrization of the gluon by CTEQ, and the 
use of very restrictive parametrizations of strangeness by both MSTW and CTEQ. In 
summary, while the lack of inclusion of heavy quark terms may be responsible for some 
of the discrepancies observed in Table [12] and Fig. [4T] it cannot be the only explanation 
(it cannot account for cases in which NNPDF agrees with CTEQ but not MSTW or con- 
versely). The issue should be re-examined after the inclusion of heavy quark mass effects 
in NNPDF, ideally within a systematic benchmarking of parton distributions. 
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Figure 41: Graphical comparison of the cross sections from Table [T21 
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7 Conclusions and outlook 



The NNPDF2.0 parton determination is the first global parton determination based on 
NNPDF methodology, and it is also the first global parton determination in which NLO 
QCD is used consistently throughout, without resorting to the -factor approximation. 
We have seen that the NNPDF methodology can accommodate a complex combination of 
DIS and hadronic datasets without any particular difficulties: in fact, the only bottleneck 
in the implementation of the NNPDF2.0 global fit has been computational, requiring the 
development of the FastKernel method discussed in Sect. [3] in order for DGLAP evolution 
and the computation of physical observables to be fast enough. 

In previous NNPDF work it was shown that NNPDF parton determinations behave 
in a statistically consistent way upon the subsequent inclusion of new data, without any 
adjustment being required as the new data are included, and with uncertainties decreasing 
upon the addition of new information, or at most remaining constant when inconsistent 
data are added. Here we have seen that this remains true when hadronic and deep-inelastic 
data are combined. In fact, we have found complete consistency between DIS and hadronic 
data, with some hadronic data (jets) being reasonably well predicted by the DIS fit and 
leading to small improvements, and other hadronic data (Drell-Yan) introducing new 
information which allows a quantitative determination of some PDF combinations that 
were determined with moderate or poor accuracy by DIS data, such as the light quark sea 
asymmetry. 

Progress has been made recently towards the inclusion of heavy quark mass effects 
in the NNPDF framework [81] and in the benchmarking of different approaches for the 
inclusion of heavy quark mass effects [70] . Once these are included in a global NNPDF fit 
accurate and reliable NLO phenomenology at the LHC will be possible. 



The NNPDF2.0 PDFs (sets of N iep = 100 and 1000 replicas), as well as several of the 
sets based on reduced or different datasets discussed in Sect. 15.41 (old HERA-I data, DIS 
only, DIS+JET only, DIS+DY only, sets of N rep = 100 replicas), and also sets determined 
using all values of 0.114 < a s (M^)0.124 in steps of Aa s (Mz) = 0.001 are available from 
the NNPDF web site, 

http://sophia.ecm.ub.es/nnpdf . 

They are also available through the LHAPDF interface [8]. 
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A Distances between PDFs: definition and meaning 



Given two sets of -/V re p and iVrep replicas, one is often interested in knowing whether 
they correspond to different instances of the same underlying probability distribution, or 
whether instead they come from different underlying distributions. Of course, for finite 
iV re p this question can only be answered in a statistical sense. To this purpose, we define 
the square distance between two estimators based on the given samples as the square 
difference between the estimators divided by its expectation value, i.e. divided by the 
corresponding standard deviation. By construction, the expectation value of the distance 
is one. 

The following cases are of particular interest: 



• Expected value 



Given a set of N^ep replicas qf^ of some quantity q, the estimator for the expected 
(true) value of q is the mean 

-*V re p i=\ 

The square distance between the two estimates of the expected value obtained from 
sets q± , q^ is then 



* ((*">>. («»>>)- <^!zy: } i± 



(63) 



(l)LW' '/J-r«(2) 



where the variance of the mean is given by 



4)[^ ) )] = -iy4)^ ) ] ( 64 ) 

J- 'rep 

in terms of the variance o -2 ^ [gW] of the variables q® (which a priori could come from 
two distinct probability distributions). We estimate the variance of the mean from 
the variance of the replica sample as 

JVrep — 1 k=1 

with {q^) given by Eq. (j62j) . 
• Uncertainty 

Given a set of Nxep replicas q^ of some quantity q, the estimator for the square 



uncertainty of q is the variance of the replica sample given by Eq. (|65p . The distance 
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between the two estimates of the square uncertainty obtained from sets , qf^ is 
then 



j2/ 2 2 \ 

d { , (T {iy (J {2)) 
where for brevity we have defined 



'(i) 



a 



(2) 



CJ (l)^(l)]+ fT (2)^(2)] 



(2)L-(2)J 
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[9 



01 



(66) 



(67) 



The variances cr^Ja^J of the square uncertainties could also be estimated from the 
replica sample, by computing the variance from various subsets of the given replica 
sample, and then the variance of these resulting variances as the subset is varied; for 
finite number of replicas this may lead to loss of statistical accuracy. For simplicity 
here we use instead the expression [67] 
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(0 

rep 



i7i4 [q 



Wl 
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rep 
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where as above <rh\ is estimated using Eq. (|65p . while the fourth moment of the 
probability distribution is estimated from the corresponding moment of the replica 
sample (which provides an estimate of it which is only asymptotically unbiased): 
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(0 
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rep fc=l 
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w 
9fc 



(69) 



In practice, for small-sized replica samples the distances defined in Eq. (|63[) and 
Eq. (|66p display sizable statistical fluctuations. In order to stabilize the result, all dis- 
tances computed in this paper are determined as follows: we randomly pick N^%/2 out 

(i) 

of the iVrep replicas for each of the two subsets. The computation of the square distance 
Eq. ([63[) or Eq. ([66]) is then repeated for A^ part = 100 (randomly generated) choices of 
./Vrop/2 replicas, and the result is averaged: this is sufficient to bring the statistical fluc- 
tuations of the distance at the level of a few percent. The distances shown in Sect. [S] are 
the square root of this average, computed taking for gW the value of some PDF at fixed x 
and Q 2 obtained from a given pair of fits. Through Sect. [5] the choice Q 2 = Qq = 2 GeV 2 
is always adopted. 

The distance defined in this way measures whether the given samples do or do not 
come from the same underlying probability distribution, and in particular Eq. (|63p and 
Eq. (]66p test whether the two distributions from which the two samples are taken have 
respectively the same mean and the same standard deviation. By construction, the prob- 
ability distribution for the distance coincides with the \ 2 distribution with one degree of 
freedom, and thus it has mean (d) = 1, and d < 2.3 at 90% confidence level. 

Note that asking whether two PDF determinations come from the same underlying 
distribution is much more restrictive than asking whether they are consistent within un- 
certainties. Consider for instance the case of a pair of PDF determinations, such that the 
dataset on which one of the two is based is a subset of the dataset of the other, and such 
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that all data are consistent with each other. These two determinations will clearly not 
come from the same underlying distribution, because the distribution of PDFs obtained 
from the wider dataset will have smaller uncertainty. However, if the data are consistent 
they will remain nevertheless consistent within uncertainties. 

In particular, the determination of moments of the underlying distribution becomes 
more precise as as the number of replicas is increased: e.g. the accuracy in determination of 
the expectation value scales as 1/ N iep , compare Eq. ([6lj) , so if the underlying probability 
distributions are different the distance will grow as \/N rep in the large iV rep limit. In this 
limit (in which the central values of the underlying distribution are accurately estimated 
by mean over the replica sample) the distance between central values is given by the 
distance rescaled by y/N Tep : otherwise stated, if N^p = -/Vrep = N Tep , then 



provides (in the large N rep ) limit, the difference between central values in units of the 
standard deviation. It follows that because of the halving of the size of the sample required 
for averaging as discussed above, for all distances shown in Sect. and computed with 
N rcp = 100 replicas, one sigma corresponds to d = \/50 ~ 7. 




(70) 
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B Drell— Yan observables 



We provide here the full expressions for Drell- Yan observables included into the NNPDF2.0 
analysis (both virtual photon and vector boson production). We adopt the notations and 
conventions of Refs. [54^155] . For explicit expression of the inclusive jet cross-sections, we 
can refer to the documentation of the FastNLO project from which we took the precom- 
puted tables [16J. 



B.l Rapidity and xf distributions 



The leading order parton kinematics was given in Eq. Q. The rapidity distribution for 
the DY process can be then expressed at NLO as 



da 



dM 2 dy 



D i ^(x 1 ,x 2 ) + -r^Dgg 



Ana 2 sr-^ 2 



dxi / dx 2 



An 



M 2 

Xl,X2, — 5- 
/4 



+ 



D^q (xi,x 2 , g(xi,fj, F ) {qi(x 2 ,fi F ) + %(x 2 ,//|)} + (1 -B- 2) 



An 



The LO coefficient functions for this distribution are given by 

D^(x 1 ,x 2 ) = S(xi -x\)5(x 2 -x° 2 ); 



(71) 



(72) 



the NLO contribution is explicitly given in Ref. 

For their practical implementation we exploited the following standard identities: 



dt- 



Li 2 (x) 
/(*) 



dti f 

1 Jx'i 



dtfit) 
dt 2 



(t-x) 
ln(l-a:/i) 
t — x 

f{tiM) 



x , ln(l - t) 
dt— 



t — X 
1 dt(f(t)-f(x)) 



ln(l - x/t) 
t — x 



(73) 
(74) 
(75) 



\{t\ - Xi)(t 2 - x 2 )}_ 



.'■1 



dh / dt- 



■1:2 



f(h,t 2 ) - f(ti,x 2 ) - f(xi,t 2 ) + f(xi,x 2 ) 
[(h - xi)(t 2 - x 2 )\ 



(76) 



Distributions in terms of Feynman xf are also frequently used: the leading order 
parton kinematics was given in Eq. . The Drell- Yan x p distribution of lepton pairs at 
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NLO is given by 
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4tt 
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(77) 



where the sum over i runs over all Nf quark flavours. 
The LO coefficient function is given by 

~ ( o) , v _ 5(x 1 -x 1 )6(x 2 -x° 2 ) 
The NLO contribution coming from qq annihilation is explicitly given in Ref. 
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B.2 Vector boson production 

For vector boson production at hadron colliders, the cross section is differential in a single 
variable y, the rapidity of the vector boson. The unpolarized vector boson production 
cross sections at NLO is 



da 
dy 



nG F M 2 V2 



3s 



hi 



1] 



I ,l 

dx\ I dx2 



D^( Xl , X2 ,x1,x 2) + ^D^ 



Air 



qq 



o o M 2 \ 

X\, X 2 , Xi, X 2 , — o- I 

/4/ 



:^q i (xi,iJ 2 F )qj(x2,fJ 2 F ) + qi(xi, /i|)^-(x2,/i|)| 



(79) 



where are the electroweak couplings defined in Eq. (jlOp The coefficient functions in 
Eq. ()79p are identical to those in the Drell-Yan rapidity distribution Eq. (|7ip . 

Note that for proton-antiproton collisions (such as at the Tevatron) one of the two 
parton distributions refers to a proton and the other to an antiproton, i.e. in practice 
one should replace (?j(x 2 ) —> %(aJ 2 ) and conversely in the above expression. Similarly for 
proton- nucleus collisions, where isospin symmetry of the nucleus target should be taken 
into account. 
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